Skip to content

bug/opencv-python should be headless to avoid dependency on Xorg #2503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tigerinus opened this issue Feb 4, 2024 · 12 comments
Closed

bug/opencv-python should be headless to avoid dependency on Xorg #2503

tigerinus opened this issue Feb 4, 2024 · 12 comments
Labels
bug Something isn't working pdf

Comments

@tigerinus
Copy link

tigerinus commented Feb 4, 2024

Describe the bug

Getting following error when loading PDF files on a container image to be hosted in cloud:

  ...
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/auto.py", line 81, in <module>
    from unstructured.partition.pdf import partition_pdf
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 76, in <module>
    from unstructured.partition.ocr import (
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 6, in <module>
    import cv2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

However libGL.so.1 is part of Xorg binaries. We could switch to a full Linux distro to resolve this, but a better option is to have opencv-python-headless in dependency requirements instead of opencv-python.

@tigerinus tigerinus added the bug Something isn't working label Feb 4, 2024
@mhfarahani
Copy link

mhfarahani commented Feb 13, 2024

Having the same issue when importing partition_pdf

from unstructured.partition.pdf import partition_pdf

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[8], line 2
      1 import os
----> 2 from unstructured.partition.pdf import partition_pdf
      3 from unstructured.staging.base import elements_to_json

File /opt/conda/lib/python3.10/site-packages/unstructured/partition/pdf.py:77
     64 from unstructured.partition.common import (
     65     convert_to_bytes,
     66     document_to_element_list,
   (...)
     71     spooled_to_bytes_io_if_needed,
     72 )
     73 from unstructured.partition.lang import (
     74     check_language_args,
     75     prepare_languages_for_tesseract,
     76 )
---> 77 from unstructured.partition.pdf_image.pdf_image_utils import (
     78     annotate_layout_elements,
     79     check_element_types_to_extract,
     80     save_elements,
     81 )
     82 from unstructured.partition.pdf_image.pdfminer_processing import (
     83     merge_inferred_with_extracted_layout,
     84 )
     85 from unstructured.partition.pdf_image.pdfminer_utils import (
     86     open_pdfminer_pages_generator,
     87     rect_to_bbox,
     88 )

File /opt/conda/lib/python3.10/site-packages/unstructured/partition/pdf_image/pdf_image_utils.py:9
      6 from pathlib import PurePath
      7 from typing import TYPE_CHECKING, BinaryIO, List, Optional, Tuple, Union, cast
----> 9 import cv2
     10 import numpy as np
     11 import pdf2image

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

@adi-kmt
Copy link

adi-kmt commented Mar 10, 2024

Is there a workaround @tigerinus ?

@micmarty-deepsense
Copy link
Contributor

@tigerinus, @mhfarahani what base image are you using? I'd like to replicate the described behavior on my side

@tigerinus
Copy link
Author

@tigerinus, @mhfarahani what base image are you using? I'd like to replicate the described behavior on my side

any distro that doesn't come with the required binary libGL.so.1 should be able to reproduce this issue

In our case, it's a highly customized embedded linux (buildroot based).

@micmarty-deepsense
Copy link
Contributor

As far as I can tell, there's a quite relevant dependency: layoutparser which relies on opencv-python.
I've seen that there is a similar request to yours: Layout-Parser/layout-parser#170

We have two options:
a) we'd need to create a PR in their package, or
b) let them know that it's important/pressuring to introduce the headless version in their repo and wait until it's fixed there

@tigerinus @adi-kmt @mhfarahani
If you need a workaround now, I'd say you should modify your Dockerfiles in the following way:

# install unstructured library as usual

# uninstall the full version, install headless
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

if opencv-python-headless is not sufficient, try with opencv-contrib-python-headless

Please let me know if that helps 🤝

@FilippTrigub
Copy link

Facing the same problem. The workaround works, thank you!

@laurazpm
Copy link

I've tried the workaround but now the error when importing partition_pdf is: ModuleNotFoundError: No module named 'cv2.typing'; 'cv2' is not a package

@Robs-Git-Hub
Copy link

Hitting the same issue. Is there any news on whether this could be changed to the headless version?

@MthwRobinson
Copy link
Contributor

Thanks everyone, we're going to take a look at this.

@pjaol
Copy link

pjaol commented May 23, 2024

The workaround works, just make sure that you do your uninstall after you've done your requirements install

RUN pip install  -r requirements.txt
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

*edited as I forgot what project I was looking at

@christinestraub christinestraub added pdf bug Something isn't working and removed bug Something isn't working labels Jul 3, 2024
@scanny
Copy link
Contributor

scanny commented Dec 16, 2024

Closing as inactive.

@scanny scanny closed this as completed Dec 16, 2024
@skr3178
Copy link

skr3178 commented Mar 26, 2025

Worked well.

Got error

unstructured ModuleNotFoundError: No module named 'cv2.typing'

pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

As far as I can tell, there's a quite relevant dependency: layoutparser which relies on opencv-python. I've seen that there is a similar request to yours: Layout-Parser/layout-parser#170

We have two options: a) we'd need to create a PR in their package, or b) let them know that it's important/pressuring to introduce the headless version in their repo and wait until it's fixed there

@tigerinus @adi-kmt @mhfarahani If you need a workaround now, I'd say you should modify your Dockerfiles in the following way:

# install unstructured library as usual

# uninstall the full version, install headless
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

if opencv-python-headless is not sufficient, try with opencv-contrib-python-headless

Please let me know if that helps 🤝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pdf
Projects
None yet
Development

No branches or pull requests