-
Notifications
You must be signed in to change notification settings - Fork 234
[RFC] add iterator based blocking Write
traits
#47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Write
traitsWrite
traits
@japaric I was already thinking of doing exactly that myself. Very much in favour of it and yes, it's especially useful for serial, too. It's almost worth considering ditching the slice syntax, since one can easily iterate over a slice, too. |
I concur, having to collect the iterator on linux systems shouldn't be a problem. It may be even possible to add some non breaking adjustments to spidev and i2cdev to make this unnecessary. I haven't checked this, though. |
@therealprof I don't think ditching the slice syntax is a good idea - what a slice provides that an iterator does not is a guarantee of size, which is very handy and can be more performant in some cases. I've been working with PoC implementation of this for i2c with the stm32f30x crate, and because there's no clean way to find the length of an iterator (we can get hints, but no strong promise of size), there's a noticeable difference of the behavior of the two implementations. That said, I'm excited to see this and will keep experimenting with it - it makes writing the code on the user's side of things much cleaner and more flexible. |
@scowcron Is it the same result in --release mode for you? For example if you have something like this: let mut my_buf = [0u8; 32];
spi.write(my_buf.iter()); Does the hardware level performance change versus the regular slice performance? I know Rust/LLVM will typically compile this to the same code, but it would be good to have actual hardware results. The hope (that we discussed in IRC) was that we could lean on the compiler to optimize this, so that you only "pay" for the iterator when you really need it (like when you have chained operations, which would have overhead anyway). |
Here were two Compiler Explorer links that tried to illustrate that: https://godbolt.org/g/BAziBB <= slice vs iter (same asm for x86) If you have a logic analyzer or scope (@scowcron), it would be cool to see the difference of sending some chunk of data with the old trait vs the new trait :) @therealprof I don't actually see a great reason to remove the old sliced based impl (other than "splitting the ecosystem"), and might be useful to have as an option if someone has a case where the optimizer is having a bad day. |
@jamesmunns Chances are most implementation will implement one on the hardware and base the other on that. Not sure your comparison doesn't suffer from side-effects of the you're comparing. In my experience iterators often not only generate the same or even better code than a slice based implementation (because the type is not narrowed down from the beginning), it also opens up the mind for new creative iterator uses which are often much faster than the usual indexed for loops. E.g. this was my old (non-hal) serial write implementation for the STM32F042:
Or here my current Rng implementation for the nrf51:
|
@scowcron I have made exactly the opposite experience and pretty much never work with the size of slices. Do you have an example? NB: In my I2C implementation for the nrf51 I'm also using an iterator over the passed in slice to clock out the bytes. |
@therealprof @jamesmunns I retract my previous assertion about a performance hit, compiling with --release flag speeds things up nicely, even with the iterator-based approach. There is an extra step you have to take with the stm32f30x family at very least where you have to manage an NBYTES and RELOAD field in the control register, but the processor is fast enough that it doesn't make a difference. |
In the simplest case it shouldn't make much of a difference. However once you start using the capabilities of the iterators that changes rather quickly... |
I think that would make a lot of sense. I have some concerns about having two APIs for basically doing the same thing, so I like the thought of removing the slice-based API. Never saw a case where it was required to know the size of the input, but my experience isn't broad enough for that to mean much. |
@hannobraun As @scowcron mentioned you might need to tell the MCU, e.g. a STM32, how many bytes you would like to send for the internal state machine to work. However IIRC you can also update this particular counter on the fly (e.g. to transfer more than 255 bytes at a time) if you can live with the overhead. |
@therealprof Thanks for the clarification! |
It seems to me like the iterator approach falls over for DMA based operations? The example of I2C with header / content is often implemented by making two calls to the underlying I2C driver and having a flag for whether the transaction should be terminated after each call. I'd argue that shifting arrays in/out is the lowest common denominator for most peripherals, it's explicit about what memory is being shared, and if you want to implement iterators and assembly of data that's an application level (or at least, above driver) problem. |
@ryankurte Not sure I get what you're saying. Not only is DMA special anyway, it is typically not used for blocking operations, which this is about. |
Oops, you're right. I often end up blocking on DMA transfers so servicing ISRs doesn't impact transfer timings, but I think that example is better served by wrapping the non-blocking API anyway. |
Is there any reason not to merge this right now? As far as I can tell, nobody is opposed to this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think the only question for me was about the naming, e.g. WriteIterRead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, too. Unless someone objects, I'm going to merge this in a few days.
Motivation
Some use cases require sending to a I2C / SPI slave some header (e.g.
u16
) followed by a databuffer (
&[u8]
).With the current API, which only works with slices (
&[u8]
), this requires the user to copy boththe header and the original buffer in a slightly larger buffer and then send that buffer to the
slave. This approach may result in a large memcpy (between stack allocated buffers) at runtime. It
also requires the caller to allocate a temporary buffer large enough which can sometimes be
impossible (size of data buffer is only known at runtime) or wasteful (allocate a buffer that fits
both the header and the largest possible size of the data buffer).
A more appropriate API for this use case is one that accepts iterators as arguments. Then the user
can write something like this:
Instead of something like this:
Detailed design
This RFC proposes adding one iterator-based trait for each
*Write*
trait in theblocking::{i2c,spi}
modules. The API of these traits is similar to their slice counterparts exceptthat they take a generic
impl IntoIterator<Item = W>
argument instead of an slice (&[W]
).For the full signature of the traits check the contents of the PR.
Drawbacks
i2cdev
andspidev
would have to implement these using dynamic memory allocation as their APIsonly work with slices.
But Linux devices are not as resource constrained as microcontroller so this is probably not a
problem.
Alternatives
Add
write_iter
as another method ofWrite
. This is totally doable but it would be a breakingchange because there's no way to provide a default implementation of
write_iter
based onwrite
(at least without allocating).
Unresolved questions
Bikeshed the names of the traits / methods.
Should we add a
WriteIter
counterpart toblocking::serial::Write
? cc @hannobrauncc @wose @jamesmunns