-
Notifications
You must be signed in to change notification settings - Fork 35
Fix a bug that GZipReader#gets may return incomplete line #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See also: ruby/csv#117 (comment) How to reproduce with x.csv.gz in the issue comment: Zlib::GzipReader.open("x.csv.gz") do |rio| rio.gets(nil, 1024) while line = rio.gets(nil, 8192) raise line unless line.valid_encoding? end end Reported by Dimitrij Denissenko. Thanks!!!
|
@nobu What do you think about this? |
|
If the filled size equals the reading size, diff --git a/ext/zlib/zlib.c b/ext/zlib/zlib.c
index f9af18f530e..8b6b802e09d 100644
--- a/ext/zlib/zlib.c
+++ b/ext/zlib/zlib.c
@@ -4198,12 +4198,15 @@ static long
gzreader_charboundary(struct gzfile *gz, long n)
{
char *s = RSTRING_PTR(gz->z.buf);
- char *e = s + ZSTREAM_BUF_FILLED(&gz->z);
- char *p = rb_enc_left_char_head(s, s + n, e, gz->enc);
+ long f = ZSTREAM_BUF_FILLED(&gz->z);
+ int boundary = (f == n);
+ char *e = s + f;
+ char *p = rb_enc_left_char_head(s, s + n - boundary, e, gz->enc);
long l = p - s;
if (l < n) {
n = rb_enc_precise_mbclen(p, e, gz->enc);
if (MBCLEN_NEEDMORE_P(n)) {
+ l += boundary;
if ((l = gzfile_fill(gz, l + MBCLEN_NEEDMORE_LEN(n))) > 0) {
return l;
} |
|
Sorry, I didn't see that there was the patch already. |
Umm, I think that When
When
I think that it's intentional. I think that Anyway, I'm not familiar with zlib code base. If you think that your patch is right approach, could you push your patch. I'm OK with any approach that doesn't return incomplete line. |
Fix: align to the character of |
|
It seems that https://github.com/ruby/ruby/blob/b9f7286fe95827631b11342501e471e5e6f13bbb/io.c#L3751 pp = rb_enc_left_char_head(s, p-1, p, enc);File.open("/tmp/x", "w") do |output|
output.puts("あい")
end
File.open("/tmp/x") do |input|
p input.gets(nil, 4) # This uses the 4th byte (the first byte of "い") not the 5th byte
end |
See also: ruby/csv#117 (comment)
How to reproduce with x.csv.gz in the issue comment:
Reported by Dimitrij Denissenko. Thanks!!!