Skip to content

BzDecoder does seem to propagate UnexpectedEof error from truncated streams #411

@baszalmstra

Description

@baszalmstra

The BzDecoder in async-compression silently accepts truncated bzip2 streams and returns success with 0 bytes decompressed, rather than raising an error like the synchronous bzip2::read::BzDecoder does. This behavior can lead to silent data corruption in applications that rely on proper error handling for data integrity.

Reproducer Repository

I've created a minimal reproducer that can be run with a single command:

Repository: https://github.com/baszalmstra/async-compression-bzip2-truncation-issue

git clone https://github.com/prefix-dev/async-compression-bzip2-truncation-issue.git
cd async-compression-bzip2-truncation-issue
cargo run

Expected vs Actual Behavior

Expected: When a bzip2 stream is truncated/incomplete, the decoder should return an error indicating that decompression failed due to an incomplete stream.

Actual: The async BzDecoder returns Ok(0) when reading a truncated stream, indicating success but with 0 bytes decompressed. No error is raised.

Output from the Reproducer

=== Demonstrating async-compression BzDecoder truncation issue ===

Original data size: 5400 bytes
Compressed data size: 142 bytes
Truncated data size: 71 bytes (50% of compressed)

--- Test 1: Sync bzip2::read::BzDecoder ---
✗ Sync decoder failed (this is expected for truncated data)
  Error: Custom { kind: UnexpectedEof, error: "decompression not finished but EOF reached" }

--- Test 2: Async async-compression BzDecoder ---
✓ Async decoder succeeded
  Bytes decompressed: 0
  Output size: 0 bytes
  🔴 ISSUE: No error was raised despite receiving truncated data!

As shown above:

  • Sync bzip2::read::BzDecoder: Correctly fails with UnexpectedEof error
  • Async async-compression::BzDecoder: Silently succeeds with 0 bytes (no error!)

Impact

This is a serious data integrity issue because:

  1. Silent data loss: Applications cannot detect that decompression failed
  2. Cache corruption: Package managers and download tools may cache incomplete/corrupted data as "valid"
  3. Security concerns: Applications expecting data validation through error handling will silently accept corrupted streams

Proposed Solution

The BzDecoder should propagate EOF/truncation errors from the underlying bzip2 decompressor, similar to how the sync bzip2::read::BzDecoder handles this case. When the stream ends prematurely, it should return an error like:

Err(io::Error::new(
    io::ErrorKind::UnexpectedEof,
    "decompression not finished but EOF reached"
))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions