Skip to content

Conversation

@janiversen
Copy link
Collaborator

thanks to @lsgunth

lsgunth and others added 3 commits August 30, 2023 12:20
The code in async_execute() in ModbusBaseClient appears to be written
to retry, but the send is not in the loop, so the function never retries.

To fix this put the transport_send() command inside the while loop.
Add tests to ensure retries send data the appropriate number of times
and a timeout with ModbusIOException occurs after sufficient retries
are sent.
@janiversen janiversen merged commit c526a4b into dev Aug 30, 2023
@janiversen janiversen deleted the async_retry branch August 30, 2023 10:54
@lsgunth
Copy link
Contributor

lsgunth commented Aug 30, 2023

One of the reasons I closed my PR is that the sync path should really be the same as the async path. I assumed based on your reasoning in #1750 that the sync path didn't send retries either. A cursory read this morning of transaction.py suggests that the Sync version of the code does in fact send retries, though it would be good to confirm this.

Similarly, if we are adding the no_resend_on_retry parameter, it would make sense to add it to the sync path as well, not just to the async path.

Though, IMHO I find the new parameter a bit odd and I'm not sure what the use case for it is is. If it's dangerous for a client to send retries, then make sure retries=0 (based on your reasoning, I think that would be a sane default, although changing it might be break existing applications, so that may not be possible). If, at that point, you can't send retries but need to wait 3 times as long for responses, then just increase the timeout.

@janiversen
Copy link
Collaborator Author

Currently the sync path and the sync path are very different in terms of implementation. We try not to update the sync path unless it's bug fixes.

The new transport layer, when completed with a message layer will be used for both sync and async, basically the sync client will be a thin shell over the async client, leaving us with only one implementation, just like it was done with the server a couple of versions back.

Retries is there to retry getting a response..,now it also can send the request again. I opted for an extra parameter because of compatibility.

Anyhow if you feel it should be different then feel free to submit a PR.

@janiversen
Copy link
Collaborator Author

ohh btw when using a usb adapter we have seen many times, that 1 read with a large timeout fails, but multiple reads each with 1 second works....Don't ask me to explain why, this is just an experienc.

@lsgunth
Copy link
Contributor

lsgunth commented Aug 30, 2023

That must be sync only? The async path doesn't actually do reads in that way, it just polls the kernel constantly for data to come in and then times out on a future waiting for a complete packet. The USB adapter can't have anything to do with that.

The USB adapter issue sounds like a kernel/driver bug or perhaps something in pyserial. It's possible it's been fixed.

From an api perspective though, I would handle that in the lower levels and not have parameters exposed for the user to have to worry about it. I'm not sure how the Sync timeouts work in this code base but it would make sense to have a timeout on the request that's independent of the read time out. Then it wouldn't depend on a buggy large read timeout.

@janiversen
Copy link
Collaborator Author

I think you are mixing things. The async path uses transport/protocol and asyncio, no polling.

For async serial we have a special implementation, which we inherited from pyserial, because that project is dead.

The sync version does polling until it dies, and we pay a performance penalty.

Retries needs to be at user level, they are the only ones who know what they want, likewise they are the only ones who know if a request is to be considered "dangerous".

If you take a look at transport.py, then you see the lowest level of retry (in this case reconnect mechanism), the retry (and resend) mechanism is in the message layer that guarantees the sending/receiving of a full frame, however we still need user input: like how many retries.

@janiversen
Copy link
Collaborator Author

janiversen commented Aug 30, 2023

And the USB problem was actually on both sync/async, but before the new transport layer.

I have one of these cheap chinese ones, and can reproduce that and a couple of other problems (the usb is changed on my production system).

@lsgunth
Copy link
Contributor

lsgunth commented Aug 30, 2023

I think you are mixing things. The async path uses transport/protocol and asyncio, no polling.

When I say poll I mean the kernel [e]poll() or similar call. This polling happens in the async layer. That's how async works. The poll signals when there is data available in the file descriptor which the async code reads the appropriate amount in a non-blocking way. There are no read timeouts in this method (if we were stuck waiting for a timeout on a read call then the async code wouldn't be very asynchronous). Ultimately this will be implemented in the python async code with an epoll() call with a timeout. The timeout is pure kernel functionality so that can't depend on the underlying hardware at all. So it doesn't make any sense to me how a hardware quirk could cause the behaviour you say in the async path.

The sync version does polling until it dies, and we pay a performance penalty.

I've only done some cursory reading but it looks like the sync code constantly asks the kernel how many bytes are waiting with a time.sleep() in a while loop. I could see how you might call this polling but it is not what I meant when I said poll.

I can see how this would perform poorly. I would not have done it this way. But also I can't see how this could be a problem with the underlying hardware. If the timeout in __wait_for_data is large then it'll just sleep for a longer time and it has nothing to do with hardware or the underlying drivers. There's no opportunity for the hardware to return a failure for a long timeout. I'd suggest you debug this adapter again; as there might be a bug in the pymodbus code.

Retries needs to be at user level, they are the only ones who know what they want, likewise they are the only ones who know if a request is to be considered "dangerous".
If you take a look at transport.py, then you see the lowest level of retry (in this case reconnect mechanism), the retry (and resend) mechanism is in the message layer that guarantees the sending/receiving of a full frame, however we still need user input: like how many retries.

I never said there shouldn't be a retry parameter. Of course the user needs to specify how many retries they want. I said I wouldn't work around a weird hardware debug problem by exposing an interface to allow the user to do confusing "retried reads" instead of just specifying a longer timeout.

@janiversen
Copy link
Collaborator Author

OK I thought you meant we were polling... Last time I checked the asyncio implementation it would actually use a device driver callback hook if on a newer linux kernel, epoll is the good old call which even exist on windows.

You are right I too would have done the sync code differently, but to be fair this code have 15+ years so I am happy it just works.

We (or more correctly I) have been driving a rewrite effort during the last year, and we are slowly making progress. Changing the server to only be async, was a monster task, but it has paid off, only having one code base makes maintenance a lot easier.

It is quite obvious that you know a lot, would be interesting if I could make you interested in helping with the rewrite effort. The message layer (framer directory in the current code base) is quite a chunk, because that is where most of the special handling happens...it would be real nice to have one like you to ping-pong with.

The design of the message layer is a bit non-standard, I have started making a number of test scenarios that uses the message layer as a black box in order to make sure the API is efficient and sufficient.

@lsgunth
Copy link
Contributor

lsgunth commented Aug 30, 2023

Thanks for your efforts on the rewrite. I have noticed the pymodbus code is indeed challenging due to the amount duplicated in the sync and async paths; and following the paths can sometimes be rather tangled.

Unfortunately, I don't have a lot of time to work on a lot of cleanup in this project. I just mostly need to fix bugs so I can get my project working. Though, I've used this library a few times in the past in smaller ways, so I do appreciate that it exists and is being maintained. I'd be happy to discuss issues if it is of use to you.

I'm not really following what your plans are in the message layer. I can say that when I'm refactoring codebases my preferred approach is to do lots of small commits which mostly move code around into new classes or functions. Small changes where a review of the commit can reasonably easily tell that the functionality before and after is the same. PR #1648 seems to have one commit that has unrelated changes in it and mostly adds code without the corresponding usage in the code base. As a reviewer, this makes it very difficult for me to tell if it matches the functionality of the code it is meant to replace or even if it is a positive improvement.

Writing and improving tests is always worth every bit of effort.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants