Skip to content

Commit 5dfe9d2

Browse files
nealcardwellkuba-moo
authored andcommitted
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
Testing determined that the recent commit 9e046bb ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does not always ensure retrans_stamp is 0 after a TFO payload retransmit. If transmit completion for the SYN+data skb happens after the client TCP stack receives the SYNACK (which sometimes happens), then retrans_stamp can erroneously remain non-zero for the lifetime of the connection, causing a premature ETIMEDOUT later. Testing and tracing showed that the buggy scenario is the following somewhat tricky sequence: + Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO cookie + data in a single packet in the syn_data skb. It hands the syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially, it then reuses the same original (non-clone) syn_data skb, transforming it by advancing the seq by one byte and removing the FIN bit, and enques the resulting payload-only skb in the sk->tcp_rtx_queue. + Client sets retrans_stamp to the start time of the three-way handshake. + Cookie mismatches or server has TFO disabled, and server only ACKs SYN. + tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears retrans_stamp. + Since the client SYN was acked but not the payload, the TFO failure code path in tcp_rcv_fastopen_synack() tries to retransmit the payload skb. However, in some cases the transmit completion for the clone of the syn_data (which had SYN + TFO cookie + data) hasn't happened. In those cases, skb_still_in_host_queue() returns true for the retransmitted TFO payload, because the clone of the syn_data skb has not had its tx completetion. + Because skb_still_in_host_queue() finds skb_fclone_busy() is true, it sets the TSQ_THROTTLED bit and the retransmit does not happen in the tcp_rcv_fastopen_synack() call chain. + The tcp_rcv_fastopen_synack() code next implicitly assumes the retransmit process is finished, and sets retrans_stamp to 0 to clear it, but this is later overwritten (see below). + Later, upon tx completion, tcp_tsq_write() calls tcp_xmit_retransmit_queue(), which puts the retransmit in flight and sets retrans_stamp to a non-zero value. + The client receives an ACK for the retransmitted TFO payload data. + Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to make tcp_ack_is_dubious() true and make us call tcp_fastretrans_alert() and reach a code path that clears retrans_stamp, retrans_stamp stays nonzero. + Later, if there is a TLP, RTO, RTO sequence, then the connection will suffer an early ETIMEDOUT due to the erroneously ancient retrans_stamp. The fix: this commit refactors the code to have tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and tcp_rcv_fastopen_synack() share code in this way because in both cases we get a packet indicating non-congestion loss (MTU reduction or TFO failure) and thus in both cases we want to retransmit as many packets as cwnd allows, without reducing cwnd. And given that retransmits will set retrans_stamp to a non-zero value (and may do so in a later calling context due to TSQ), we also want to enter CA_Loss so that we track when all retransmitted packets are ACked and clear retrans_stamp when that happens (to ensure later recurring RTOs are using the correct retrans_stamp and don't declare ETIMEDOUT prematurely). Fixes: 9e046bb ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()") Fixes: a7abf3c ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()") Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Yuchung Cheng <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 84b767f commit 5dfe9d2

File tree

1 file changed

+27
-11
lines changed

1 file changed

+27
-11
lines changed

net/ipv4/tcp_input.c

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2782,13 +2782,37 @@ static void tcp_mtup_probe_success(struct sock *sk)
27822782
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS);
27832783
}
27842784

2785+
/* Sometimes we deduce that packets have been dropped due to reasons other than
2786+
* congestion, like path MTU reductions or failed client TFO attempts. In these
2787+
* cases we call this function to retransmit as many packets as cwnd allows,
2788+
* without reducing cwnd. Given that retransmits will set retrans_stamp to a
2789+
* non-zero value (and may do so in a later calling context due to TSQ), we
2790+
* also enter CA_Loss so that we track when all retransmitted packets are ACKed
2791+
* and clear retrans_stamp when that happens (to ensure later recurring RTOs
2792+
* are using the correct retrans_stamp and don't declare ETIMEDOUT
2793+
* prematurely).
2794+
*/
2795+
static void tcp_non_congestion_loss_retransmit(struct sock *sk)
2796+
{
2797+
const struct inet_connection_sock *icsk = inet_csk(sk);
2798+
struct tcp_sock *tp = tcp_sk(sk);
2799+
2800+
if (icsk->icsk_ca_state != TCP_CA_Loss) {
2801+
tp->high_seq = tp->snd_nxt;
2802+
tp->snd_ssthresh = tcp_current_ssthresh(sk);
2803+
tp->prior_ssthresh = 0;
2804+
tp->undo_marker = 0;
2805+
tcp_set_ca_state(sk, TCP_CA_Loss);
2806+
}
2807+
tcp_xmit_retransmit_queue(sk);
2808+
}
2809+
27852810
/* Do a simple retransmit without using the backoff mechanisms in
27862811
* tcp_timer. This is used for path mtu discovery.
27872812
* The socket is already locked here.
27882813
*/
27892814
void tcp_simple_retransmit(struct sock *sk)
27902815
{
2791-
const struct inet_connection_sock *icsk = inet_csk(sk);
27922816
struct tcp_sock *tp = tcp_sk(sk);
27932817
struct sk_buff *skb;
27942818
int mss;
@@ -2828,14 +2852,7 @@ void tcp_simple_retransmit(struct sock *sk)
28282852
* in network, but units changed and effective
28292853
* cwnd/ssthresh really reduced now.
28302854
*/
2831-
if (icsk->icsk_ca_state != TCP_CA_Loss) {
2832-
tp->high_seq = tp->snd_nxt;
2833-
tp->snd_ssthresh = tcp_current_ssthresh(sk);
2834-
tp->prior_ssthresh = 0;
2835-
tp->undo_marker = 0;
2836-
tcp_set_ca_state(sk, TCP_CA_Loss);
2837-
}
2838-
tcp_xmit_retransmit_queue(sk);
2855+
tcp_non_congestion_loss_retransmit(sk);
28392856
}
28402857
EXPORT_SYMBOL(tcp_simple_retransmit);
28412858

@@ -6295,8 +6312,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
62956312
tp->fastopen_client_fail = TFO_DATA_NOT_ACKED;
62966313
skb_rbtree_walk_from(data)
62976314
tcp_mark_skb_lost(sk, data);
6298-
tcp_xmit_retransmit_queue(sk);
6299-
tp->retrans_stamp = 0;
6315+
tcp_non_congestion_loss_retransmit(sk);
63006316
NET_INC_STATS(sock_net(sk),
63016317
LINUX_MIB_TCPFASTOPENACTIVEFAIL);
63026318
return true;

0 commit comments

Comments
 (0)