Raises "EI stream not found" while reading RunLengthDecode (RL) inline image

I am trying to read the content of a PDF

## Environment

```bash
$ python -m platform
Linux-6.16.12+deb14+1-amd64-x86_64-with-glibc2.36

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.2.0, crypt_provider=('cryptography', '3.4.8'), PIL=9.0.1
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader
reader = PdfReader("/path-to-file.pdf")

for page in reader.pages:
    text = page.extract_text()
```

The PDF is [cedolini_esempio-1.pdf](https://github.com/user-attachments/files/23900687/cedolini_esempio-1.pdf).

While debugging, I found out that the image it is trying to parse is:
```
\x00\xf8\xff\x00\x00\x02\xfe\xff\x00\x80\xff\x00\x00?\x00\xff\x00\xfe\xfe\x00\xfc\xff\x00\x80\xff\x00\x00[...]\xfbU\x00\x7f\x80\r\nEI
```
The problem seems to be that https://github.com/py-pdf/pypdf/blob/85b53d8eb014d1c6363a71401cebfadd9d7300b0/pypdf/generic/_image_inline.py#L131 finds the `\x80` inside the image, so the following tokens are not `EI` as expected.

I read the PDF documentation (https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.0.pdf) and it says:
> The value 128 is placed at the end of the compressed data, as an EOD marker.

but I can't see such value 128.

Why aren't we looking for the EI directly like it is done in the default handler https://github.com/py-pdf/pypdf/blob/85b53d8eb014d1c6363a71401cebfadd9d7300b0/pypdf/generic/_image_inline.py#L199 ?

## Traceback

This is the relevant part of the traceback I see:

```
...
    for value in page.extract_text().split():
  File "/usr/local/lib/python3.10/site-packages/pypdf/_page.py", line 2043, in extract_text
    return self._extract_text(
  File "/usr/local/lib/python3.10/site-packages/pypdf/_page.py", line 1726, in _extract_text
    for operands, operator in content.operations:
  File "/usr/local/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1406, in operations
    self._parse_content_stream(BytesIO(self._data))
  File "/usr/local/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1285, in _parse_content_stream
    ii = self._read_inline_image(stream)
  File "/usr/local/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 1328, in _read_inline_image
    data = extract_inline_RL(stream)
  File "/usr/local/lib/python3.10/site-packages/pypdf/generic/_image_inline.py", line 142, in extract_inline_RL
    raise PdfReadError("EI stream not found")
pypdf.errors.PdfReadError: EI stream not found
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Raises "EI stream not found" while reading RunLengthDecode (RL) inline image #3517

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Raises "EI stream not found" while reading RunLengthDecode (RL) inline image #3517

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions