-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
Closed
Labels
netIssues and PRs related to the net subsystem.Issues and PRs related to the net subsystem.
Description
I found this issue while testing how our app handles a network interruption and noticed this behavior with using node sockets in Linux. I have verified this issue happens with both 0.10.41
and 4.2.3
.
Reproduce Setup
- Need to setup two docker containers. Container deps: update openssl to 1.0.1j #1 hosts the TCP echo service and container test: don't remove empty.txt on win32 #2 runs the socket write code.
- In the echo container I installed
ecco
- The socket write container will run the app but it needs to have
iptables
andnetstat
to run the test. - Run the test container test: don't remove empty.txt on win32 #2 on the host network and make it privilaged.
--net=host --privileged
- In our test we need to wait for an
ETIMEDOUT
which can take 15 minutes with the default setting. To make the test bearable runsysctl -w net.ipv4.tcp_retries2=8
in the test container to make this around 100s instead.
Dockerfile
FROM node:0.10.41
RUN apt-get update && apt-get install -y iptables net-tools
RUN npm install -g ecco
CMD ["bash"]
Socket write code
#!/usr/bin/env node
"use strict"
var net = require('net');
var MESSAGE_RATE = 1000;
var RANDOM_DATA = 700;
var host = '172.17.0.4';
var port = '9999';
var debug = require('debug')('SocketTest');
var socket = net.createConnection(port, host);
socket.addr = host + ':' + port;
socket.host = host;
socket.port = port;
socket.on('connect', function () {
debug(this.address())
var lastError = this.error;
this.error = null;
if (lastError) {
debug('reconnect');
} else {
startLoop()
}
});
socket.on('error', function (err) {
this.error = err;
debug('error', err);
});
socket.on('close', function (had_error) {
debug('close');
retry(this);
});
socket.on('end', function () {
debug('Socket End')
retry(this);
});
socket.buffer = new Buffer([]);
socket.on('data', function (data) {
debug('received data', data)
});
socket.on('drain', function(){
debug('socket drained');
});
socket.setKeepAlive(true, 20000);
function retry(s) {
if(s.retrying || s.closing) return;
s.retrying = true;
s.retryTimer = setTimeout(function () {
s.retrying = false;
s.connect(s.port, s.host);
}, 1000);
}
var messageNumber = 0
var lastTimer = null
function startLoop() {
if (!socket.error) {
var currentNumber = messageNumber
debug('Sending message #'+ messageNumber)
var ret = socket.write(new Array(RANDOM_DATA).join(Math.random().toString(36)) + ' Received message # ' + messageNumber++, sendCb)
debug('Socket write %s', ret ? 'flushed to kernel': 'queued in memory');
debug('Socket %d bytes written', socket.bytesWritten);
debug('Socket %s bytes read', socket.bytesRead);
debug('Socket bufferSize %d', socket.bufferSize);
} else {
debug('socket error not writing.')
}
lastTimer = setTimeout(startLoop, MESSAGE_RATE);
function sendCb(){
debug('Callback called for #%d', currentNumber);
currentNumber = null
}
}
Reproduce Steps
- Start the echo app on container deps: update openssl to 1.0.1j #1.
ecco -l -v -p 9999 --address 0.0.0.0
- Make note of the IP address for that container using
awk 'NR==1 {print $1}' /etc/hosts
- Update the IP in the test code (in my case it's
172.17.0.4
) and run the test app - Observe the data streaming in the echo container's terminal
- In container test: don't remove empty.txt on win32 #2 add the firewall rule to block the echo server's IP
iptables -A INPUT -s "172.17.0.4" -j DROP;iptables -A OUTPUT -s "172.17.0.4" -j DROP;
- Notice the data stops streaming in the echo terminal. This is expected behavior.
- Run
netstat -napc
in container test: don't remove empty.txt on win32 #2 and observe the socket'sSend-Q
increasing over time. - Wait until the
ETIMEDOUT
occurs and then remove the rule.iptables -D INPUT -s "172.17.0.4" -j DROP;iptables -D OUTPUT -s "172.17.0.4" -j DROP;
- Verify the rule is removed
iptables -nvL
- After a bit notice container deps: update openssl to 1.0.1j #1 says
client connected
after container test: don't remove empty.txt on win32 #2 reconnects. - The socket appears to write out data but no data appears to the echo terminal. This is unexpected.
- Apply the block rule again and do netstat again. We expect the
Send-Q
of the new connection to accumulate but it does not. This is also unexpected.
(don't forget to remove the rule after testing)
Observations
- This issue doesn't happen on a Mac. I don't have an windows enviorment so I have not tested there.
- This issue also does not happen when you send small amounts of data to socket.write. if you change the
RANDOM_DATA
to a lower number like 70 it will recover. - In
0.10.41
I noticedprocess._errno
contains anECANCELED
error and doesn't look like socket code errors out the socket when it receives this error string from the handle.
Metadata
Metadata
Assignees
Labels
netIssues and PRs related to the net subsystem.Issues and PRs related to the net subsystem.