Skip to content

Conversation

@lsgunth
Copy link
Contributor

@lsgunth lsgunth commented Aug 22, 2023

Using broadcast messages with local echo almost works with asynchronous clients but it is racy.

If the echo for a broadcast comes in after the next transaction has been sent, then the sent_buffer gets overridden and the echo of the broadcast is forgotten. This corrupts the data stream.

This PR has a couple commits to make it more robust: for one allow the echo to be split along multiple packets, then ensure a broadcast waits for the echo before returning from async_execute.

The last two commits add tests for both these issues.

Copy link
Collaborator

@janiversen janiversen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you have missed out that local echo must work for both client and server, since transport is used in both.

@janiversen
Copy link
Collaborator

I do not understand your racing condition...there are no "next transaction" before a response is received from the first transaction.

"local echo" does not involve any serial/network communication but are purely at device driver level in the local computer, so there are no delay in receiving the data.

Are you basing this PR on a real life problem ? if so please open an issue, with the debug logs,

Your PR in general is well done, so I am sorry be a bit negative.

Copy link
Collaborator

@janiversen janiversen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 23, 2023

I do not understand your racing condition...there are no "next transaction" before a response is received from the first transaction.

When sending a broadcast that will result in an echo, the current code sets the send_buffer to what was previously sent. The broadcast packet does not wait for any response and returns immediately after the packet is sent. The user code continues while the packet is transmitting on the line, often before the local echo is fully received. If the user code sends another packet immediately after the broadcast packet, it will overwrite send_buffer with the new packet before the echo was received. When the echo of the broadcast packet is then received it is not seen as an echo because send_buffer contains the contents of the next packet. This results in communication failures seeing there's unexpected data on the bus.

"local echo" does not involve any serial/network communication but are purely at device driver level in the local computer, so there are no delay in receiving the data.

I'm working with hardware that uses an MAX13487E with auto direction control to a half-duplex RS485 bus. The auto direction control is necessary seeing the microprocessor cannot control the direction quick enough to reliably receive the other devices' responses. Auto direction control automatically controls whether the transmitter is enabled but it doesn't do anything for the receiver, and thus everything sent on the bus will be received by the hardware. In this case then, the local echo is at the hardware level, cannot be disabled and is delayed for as long as it takes to send the buffer and receive it back through the kernel. Assuming that a local echo is always instant is wrong.

Are you basing this PR on a real life problem ? if so please open an issue, with the debug logs,

Yes, this is a real life problem. I'm not able to capture the issue with debug logs enabled seeing it changes the timing: it adds enough delay that the local echo is received before the next packet is sent.

Sorry, I should have included more context in the PR.

@janiversen
Copy link
Collaborator

janiversen commented Aug 23, 2023

Broadcast do not exist in modbus, especially not in rs485 communication, you can set slave=0 meaning address all devices on the line.

The client still looks for a response and waits until timeout, at least when you use the mixin methods.

If you send multiple requests on the serial line without giving the devices time to respond (go from listen mode to write mode), and that breaks the modbus protocol.

Part of the PR makes sense to me, especially since the added test shows a clear bug, but the future is not acceptable nor that transport logic goes into client base. The transport class is to be seen as a closed box with a few methods and a few callbacks, and it handles everything relevant to to the transport layer.

You could e.g. add to send_buffer instead of overwriting it since everything will be echoed.

@janiversen
Copy link
Collaborator

Just made a test with client_async_calls, change SLAVE=0x00 (broadcast as you call it) and ran it on a serial line.

All requests waited for a response....could the problem be, you forgotten to await the client call: "await client.read_.....", that would allow your code to continue, but break a lot of things in pymodbus.

@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 23, 2023

Broadcast do not exist in modbus, especially not in rs485 communication, you can set slave=0 meaning address all devices on the line.

Broadcasts do exist in serial modbus:

In broadcast mode, the master can send a request to all slaves.
No response is returned to broadcast requests sent by the master. The broadcast requests are necessarily writing commands. All devices must accept the broadcast for writing function. The address 0 is reserved to identify a broadcast exchange.

(From MODBUS over serial line specification and implementation guide V1.02 -- page 7. Broadcasts are also referenced in other specifications.)

In fact pymodbus supports this behaviour -- it is just optional. You must specify broadcast_enable=True when creating the client. This works, and does not wait for the response in the current pymodbus code. You can see this in the code I'm changing in async_execute(), it simply returns the text "Broadcast write sent - no response expected" and does not wait for any response. This implementation just has this bug I'm trying to solve when it is combined with handle_local_echo=True.

I suspect the test you tried did not set broadcaste_enable in the client. Then you would see what you are seeing. I can tell you that I am definitely await-ing the calls correctly.

Broadcast support on the server side is different. In order for a server to support broadcasts it has to handle requests with a slave ID of zero and not send the response for such requests. There is a similar broadcast_enable flag in the server context and is implemented here. The server side does not have an equivalent of the problem I'm trying to solve with handle_local_echo seeing it doesn't actually send anything for broadcast packets.

If you prefer the approach of appending to send_buffer, I can try that. However I won't get to it until next week.

@janiversen
Copy link
Collaborator

you are right, and I stand corrected....however your idea about the server is a bit off, remember a server can sit on a RS485 with many other devices so it is quite a lot more complicated.

Anyhow, my main point is, Keep transport clean and do not spread transport issues into the upper levels. We have worked hard the last few releases to clean especially the client code !

When you have it ready, then add me as reviewer.

@janiversen
Copy link
Collaborator

Just passed over transport, the solution is quite simple.

All you need to do is in transport_send check if send_buffer is empty and wait if not.

@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 23, 2023

All you need to do is in transport_send check if send_buffer is empty and wait if not.

Ok, that sounds reasonable. Just a quick question:

As it is now, transport_send() does not have any knowledge of whether it is a broadcast request or not.

If it doesn't know it's a broadcast then we are adding an additional wait on every request which is unnecessary and reduces efficiency.

Would you be ok with passing a broadcast flag to this function which adds the wait in when handle_local_echo is also true?

@janiversen
Copy link
Collaborator

It should not know about broadcast the is message layer.

It only needs to wait if send_buffer contains something, for non broadcast that will not be the case so no problem

@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 23, 2023

It only needs to wait if send_buffer contains something, for non broadcast that will not be the case so no problem

send_buffer will be set for all messages when handle_local_echo is true, not just broadcast messages. In 99% of the cases it does not need to wait for the local echo seeing the code will eventually be waiting for a response which includes that. The wait is only required when broadcast packets are sent and handle_local_echo is true. send_buffer will still be set for non-broadcast cases.

In my opinion adding a wait for all packets is just adding inefficiencies to common use cases when it's only really needed in the niche case. So I think it's worthwhile passing a broadcast flag to transport_send() or finding another way.

@janiversen
Copy link
Collaborator

you forget in all other cases, there are no extra send until data is received which will clear the send_buffer so there are no extra wait.

But it's your PR so feel free to do what you want, I will review what you submit. To me it is important to keep the layers separated, and broadcast relate to message handling not transport,

@janiversen
Copy link
Collaborator

We are getting close to release time, so if you want this in please solve the outstanding issue, and request a new review.

The local echo code lookes for the local echo inside a received datagram.
However, this can be racy, especially with serial ports. It's always
the case that the local echo should start the transaction, but it's
possible for a partial echo to arrive.

Handle partial echos by dropping a part of the sent_buffer and print
an error if the local echo is not actually received when it is expected.
Test that a local echo received over two datagrams is handled correctly.
When using broadcast packets with the local echo, the echo is not waited for.
If transport_send() gets called again before the echo is received then
`sent_buffer` is overwritten and the echo that might be coming from the
broadcast packet will be forgotten about.

To fix this, always append the expected echo to `sent_buffer`. Thus if there is
an existing echo that has not been received yet it is not forgotten about and
subsequent receives will do the right thing.
Broadcast packets do not wait for a response and thus any local echo
will be stuck in the recieve buffer. Test to ensure a subsequent packet
can be sent after the broadcast and that the local echo is handled
correctly.

Three tests are done: one where the broadcast echo and the subsequent message
echo are returned in two calls to datagram_recieved(), one where the entire
response is in one call, and one where the echo for the message packet
is split over two datagram_received() calls.
@lsgunth lsgunth force-pushed the local_echo_async_broadcast branch from 70883ee to 837750d Compare August 28, 2023 16:53
@lsgunth lsgunth requested a review from janiversen August 28, 2023 16:53
@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 28, 2023

I attempted solving the problem in the way that we discussed, however, it was not possible. transport_send() is not an async function so awaiting on a future is not possible in that function. transport_send() also called in places that were not async functions so it was not possible to easily convert it to one. (The classic function colouring problem.)

Instead, I've pushed a fix using the other solution I alluded to: append to sent_buffer in transport_send. This solves the problem by not dropping the expected echo of the broadcast packet. The difference between this solution and the previous one is that sending a broadcast packet will not wait for the echo for that packet. It will instead be processed in due course, perhaps prepended to the receive buffer in the next packet. As best I can tell, there isn't an issue with this.

Copy link
Collaborator

@janiversen janiversen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Looking forward to see more PRs from you, this one was not an easy one and we are always looking for a helping hand.

@janiversen janiversen merged commit 7ecd481 into pymodbus-dev:dev Aug 28, 2023
@lsgunth
Copy link
Contributor Author

lsgunth commented Aug 28, 2023

Thanks!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants