Skip to content

[RFC] add iterator based blocking Write traits #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 17, 2018
Merged

Conversation

japaric
Copy link
Member

@japaric japaric commented Feb 28, 2018

Motivation

Some use cases require sending to a I2C / SPI slave some header (e.g. u16) followed by a data
buffer (&[u8]).

With the current API, which only works with slices (&[u8]), this requires the user to copy both
the header and the original buffer in a slightly larger buffer and then send that buffer to the
slave. This approach may result in a large memcpy (between stack allocated buffers) at runtime. It
also requires the caller to allocate a temporary buffer large enough which can sometimes be
impossible (size of data buffer is only known at runtime) or wasteful (allocate a buffer that fits
both the header and the largest possible size of the data buffer).

A more appropriate API for this use case is one that accepts iterators as arguments. Then the user
can write something like this:

use core::iter;

let header: u8 = 0xAB;
let data_buffer: &[u8] = ..;

spi.write_iter(iter::once(header).chain(data_buffer))?;

Instead of something like this:

let header: u8 = 0xAB;
let data_buffer: &[u8] = ..;

// is this large enough? is it too large?
let mut temp_buffer: [u8; 128] = [0; 128];
temp_buffer[0] = header;
// this will panic if the buffer is not large enough
temp_buffer[1..data_buffer.len() + 1].copy_from_slice(&data_buffer);

spi.write(&temp_buffer)?;

Detailed design

This RFC proposes adding one iterator-based trait for each *Write* trait in the
blocking::{i2c,spi} modules. The API of these traits is similar to their slice counterparts except
that they take a generic impl IntoIterator<Item = W> argument instead of an slice (&[W]).

For the full signature of the traits check the contents of the PR.

Drawbacks

i2cdev and spidev would have to implement these using dynamic memory allocation as their APIs
only work with slices.

impl spi::WriteIter<u8> for Spidev {
    type Error = Error;

    fn write_iter<WI>(&mut self, words: WI) -> Result<(), Self::Error>
    where
        WI: IntoIterator<Item = W>
    {
        let v: Vec<u8> = words.into_iter().collect();
        // defer to spi::Write
        self.write(&v)
    }
}

But Linux devices are not as resource constrained as microcontroller so this is probably not a
problem.

Alternatives

Add write_iter as another method of Write. This is totally doable but it would be a breaking
change because there's no way to provide a default implementation of write_iter based on write
(at least without allocating).

Unresolved questions

  • Bikeshed the names of the traits / methods.

  • Should we add a WriteIter counterpart to blocking::serial::Write? cc @hannobraun

cc @wose @jamesmunns

@japaric japaric changed the title [RFC] add iterator based Write traits [RFC] add iterator based blocking Write traits Feb 28, 2018
@therealprof
Copy link
Contributor

@japaric I was already thinking of doing exactly that myself. Very much in favour of it and yes, it's especially useful for serial, too. It's almost worth considering ditching the slice syntax, since one can easily iterate over a slice, too.

@wose
Copy link

wose commented Mar 1, 2018

I concur, having to collect the iterator on linux systems shouldn't be a problem. It may be even possible to add some non breaking adjustments to spidev and i2cdev to make this unnecessary. I haven't checked this, though.

@scowcron
Copy link

scowcron commented Mar 1, 2018

@therealprof I don't think ditching the slice syntax is a good idea - what a slice provides that an iterator does not is a guarantee of size, which is very handy and can be more performant in some cases. I've been working with PoC implementation of this for i2c with the stm32f30x crate, and because there's no clean way to find the length of an iterator (we can get hints, but no strong promise of size), there's a noticeable difference of the behavior of the two implementations.

That said, I'm excited to see this and will keep experimenting with it - it makes writing the code on the user's side of things much cleaner and more flexible.

@jamesmunns
Copy link
Member

@scowcron Is it the same result in --release mode for you?

For example if you have something like this:

let mut my_buf = [0u8; 32];
spi.write(my_buf.iter());

Does the hardware level performance change versus the regular slice performance? I know Rust/LLVM will typically compile this to the same code, but it would be good to have actual hardware results.

The hope (that we discussed in IRC) was that we could lean on the compiler to optimize this, so that you only "pay" for the iterator when you really need it (like when you have chained operations, which would have overhead anyway).

@jamesmunns
Copy link
Member

jamesmunns commented Mar 1, 2018

Here were two Compiler Explorer links that tried to illustrate that:

https://godbolt.org/g/BAziBB <= slice vs iter (same asm for x86)
https://godbolt.org/g/gCYhkS <= iter vs chained iter, chained iter has way more overhead

If you have a logic analyzer or scope (@scowcron), it would be cool to see the difference of sending some chunk of data with the old trait vs the new trait :)

@therealprof I don't actually see a great reason to remove the old sliced based impl (other than "splitting the ecosystem"), and might be useful to have as an option if someone has a case where the optimizer is having a bad day.

@therealprof
Copy link
Contributor

@jamesmunns Chances are most implementation will implement one on the hardware and base the other on that. Not sure your comparison doesn't suffer from side-effects of the you're comparing. In my experience iterators often not only generate the same or even better code than a slice based implementation (because the type is not narrowed down from the beginning), it also opens up the mind for new creative iterator uses which are often much faster than the usual indexed for loops.

E.g. this was my old (non-hal) serial write implementation for the STM32F042:

        for c in s.as_bytes() {
            /* Wait until the USART is clear to send */
            while usart1.isr.read().txe().bit_is_clear() {}

            /* Write the current character to the output register */
            usart1.tdr.write(|w| unsafe { w.bits(u32::from(*c)) });
        }

Or here my current Rng implementation for the nrf51:

        for in_ in &mut buffer.into_iter() {
            /* Let's wait until we have a new random value */
            while self.rng.events_valrdy.read().bits() == 0 {}

            /* Write fetched random number into provided buffer */
            *in_ = self.rng.value.read().bits() as u8;

            /* Clear event for next random number value */
            self.rng.events_valrdy.write(|w| unsafe { w.bits(0) });
        }

@therealprof
Copy link
Contributor

@scowcron I have made exactly the opposite experience and pretty much never work with the size of slices. Do you have an example?

NB: In my I2C implementation for the nrf51 I'm also using an iterator over the passed in slice to clock out the bytes.

@scowcron
Copy link

scowcron commented Mar 2, 2018

@therealprof @jamesmunns I retract my previous assertion about a performance hit, compiling with --release flag speeds things up nicely, even with the iterator-based approach. There is an extra step you have to take with the stm32f30x family at very least where you have to manage an NBYTES and RELOAD field in the control register, but the processor is fast enough that it doesn't make a difference.

@therealprof
Copy link
Contributor

In the simplest case it shouldn't make much of a difference. However once you start using the capabilities of the iterators that changes rather quickly...

@hannobraun
Copy link
Member

Should we add a WriteIter counterpart to blocking::serial::Write?

I think that would make a lot of sense. I have some concerns about having two APIs for basically doing the same thing, so I like the thought of removing the slice-based API. Never saw a case where it was required to know the size of the input, but my experience isn't broad enough for that to mean much.

@therealprof
Copy link
Contributor

@hannobraun As @scowcron mentioned you might need to tell the MCU, e.g. a STM32, how many bytes you would like to send for the internal state machine to work. However IIRC you can also update this particular counter on the fly (e.g. to transfer more than 255 bytes at a time) if you can live with the overhead.

@hannobraun
Copy link
Member

@therealprof Thanks for the clarification!

@ryankurte
Copy link
Contributor

ryankurte commented Mar 5, 2018

It seems to me like the iterator approach falls over for DMA based operations?
Usually for a DMA transfer you provide an (aligned) slice of data to be read or written, with the iterator approach you're going to have to double up on storage and copy into or from a DMA buffer.
edit: for DMA you also need the size etc. in the initial setup.

The example of I2C with header / content is often implemented by making two calls to the underlying I2C driver and having a flag for whether the transaction should be terminated after each call.

I'd argue that shifting arrays in/out is the lowest common denominator for most peripherals, it's explicit about what memory is being shared, and if you want to implement iterators and assembly of data that's an application level (or at least, above driver) problem.

@therealprof
Copy link
Contributor

@ryankurte Not sure I get what you're saying. Not only is DMA special anyway, it is typically not used for blocking operations, which this is about.

@ryankurte
Copy link
Contributor

Oops, you're right. I often end up blocking on DMA transfers so servicing ISRs doesn't impact transfer timings, but I think that example is better served by wrapping the non-blocking API anyway.

@hannobraun
Copy link
Member

Is there any reason not to merge this right now? As far as I can tell, nobody is opposed to this proposal.

Copy link
Contributor

@therealprof therealprof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think the only question for me was about the naming, e.g. WriteIterRead.

Copy link
Member

@hannobraun hannobraun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, too. Unless someone objects, I'm going to merge this in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants