Skip to content

Add std::io::util #10895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 13, 2013
Merged

Add std::io::util #10895

merged 1 commit into from
Dec 13, 2013

Conversation

sfackler
Copy link
Member

This adds a bunch of useful Reader and Writer implementations. I'm not a
huge fan of the name util but I can't think of a better name and I
don't want to make std::io any longer than it already is.

}
}

impl Reader for ChainedReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason this is not next to the ChainedReader declaration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo :). Updated.

@alexcrichton
Copy link
Member

This looks really good to me, nice work!

I'd like to get another pair or two of eyes on this though. I'm always a little wary of adding new util modules, but I agree that all of these are very useful utilities which we should have. The only other useful utility-style thing which I would want is a function to copy a reader into a writer. Other than that, this seems pretty complete to me (but others should weigh in).

One possible way to get rid of util would be to just reexport these under std::io directly.

@sfackler
Copy link
Member Author

@alexcrichton I've added a copy function in. It's currently using the same uninitialized buffer trick that BufferedReader and BufferedWriter do. It does seem a bit dangerous to me since a Reader can lie about the amount of data it's given you and you'll end up looking at uninitialized memory by accident, but I don't know if that's something worth worrying about. It may be worth making an unsafe vec::bytes::with_size_uninitialized function since we're doing this in two places.

I'm not a huge fan of reexporting since it just seems to confuse the documentation.

@sfackler
Copy link
Member Author

@alexcrichton updated.

@alexcrichton
Copy link
Member

Another option I just thought of is a couple of newline-conversion utilities. Something like a reader -> reader converting newlines or a writer -> writer converting newlines (to \r\n or the other way).

@bill-myers
Copy link
Contributor

ChainedReader should take an iterator instead of a vector.

Those who want to use a vector can then pass its move_iter(), which will result in something that behaves exactly like the current version, and it's also possible to pass a custom iterator that lazily creates the readers.

Plus, this way readers that are done with are destroyed immediately with no extra code rather than kept around unused.

For instance, this would allow to concatenate files named "file.000", "file.001", "file.002", etc. and opening and closing each file at a time, while the current code requires to have all of them open at once, which is worse and might even fail due to OS open file limits.

Also, there should probably be another version that takes only two readers, but with a concrete type put in a type parameter (like TeeWriter, except it takes two readers instead of a writer).

A NullReader/EmptyReader would complete the set.

LimitReader should probably take the reader by value instead of &mut just like TeeWriter does (or, if taking by &mut were deemed best, which I think would be a mistake, then TeeWriter should take by &mut).

@sfackler
Copy link
Member Author

I'll look into switching ChainedReader over to taking an iterator.

LimitReader takes the reader by reference because (I think) the common use case will be something like

let mut r = some_reader;
let len = r.read_be_i32() as uint;
let some_struct = read_some_struct(LimitedReader::new(len, &mut r));
...

Where the reader is just loaned out temporarily. If LimitedReader took the reader by value, you'd need to add a couple of extra lines of wrapping and unwrapping.

@bill-myers
Copy link
Contributor

&mut Reader implements Reader, so that code should continue working even if LimitedReader takes an <R: Reader> by value (in that code, we'd have R = &mut SomeReader, which implements Reader).

That's why I think passing by value (where the type is a generic argument) is the correct choice, since it is strictly more general.

In fact, I believe that ChainedReader should even be ChainedReader<R: Reader, I: Iterator> so that it can be passed any of Iterator, Iterator<&mut Reader> or Iterator<~Reader>.

@sfackler
Copy link
Member Author

@bill-myers &mut Reader implements Reader but &mut R where R: Reader does not.

match self.cur_reader {
Some(ref mut r) => {
match r.read(buf) {
Some(len) => return Some(len),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the behaviour of this is slightly peculiar, e.g., say you had two 100 byte files, it would take two calls to this .read to fill a [u8, .. 200], right?

Shouldn't this function be attempting to fill the buffer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem safe. What happens if the first read returns valid data and the second one raises an error? I don't think it'd be good to throw away the data that it was able to read. read_to_end exists if you explicitly want to do that, and another method on Reader that repeatedly read to fill the buffer could be added as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true enough.

This adds a bunch of useful Reader and Writer implementations. I'm not a
huge fan of the name `util` but I can't think of a better name and I
don't want to make `std::io` any longer than it already is.
bors added a commit that referenced this pull request Dec 13, 2013
This adds a bunch of useful Reader and Writer implementations. I'm not a
huge fan of the name `util` but I can't think of a better name and I
don't want to make `std::io` any longer than it already is.
@bors bors closed this Dec 13, 2013
@bors bors merged commit 7fe5e30 into rust-lang:master Dec 13, 2013
@sfackler sfackler deleted the io-util branch December 23, 2013 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants