bug/opencv-python should be `headless` to avoid dependency on Xorg #2503

tigerinus · 2024-02-04T13:42:57Z

Describe the bug

Getting following error when loading PDF files on a container image to be hosted in cloud:

  ...
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/auto.py", line 81, in <module>
    from unstructured.partition.pdf import partition_pdf
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/pdf.py", line 76, in <module>
    from unstructured.partition.ocr import (
  File "/DATA/junk/test2/lib/python3.11/site-packages/unstructured/partition/ocr.py", line 6, in <module>
    import cv2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory

However libGL.so.1 is part of Xorg binaries. We could switch to a full Linux distro to resolve this, but a better option is to have opencv-python-headless in dependency requirements instead of opencv-python.

The text was updated successfully, but these errors were encountered:

mhfarahani · 2024-02-13T23:19:18Z

Having the same issue when importing partition_pdf

from unstructured.partition.pdf import partition_pdf

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[8], line 2
      1 import os
----> 2 from unstructured.partition.pdf import partition_pdf
      3 from unstructured.staging.base import elements_to_json

File /opt/conda/lib/python3.10/site-packages/unstructured/partition/pdf.py:77
     64 from unstructured.partition.common import (
     65     convert_to_bytes,
     66     document_to_element_list,
   (...)
     71     spooled_to_bytes_io_if_needed,
     72 )
     73 from unstructured.partition.lang import (
     74     check_language_args,
     75     prepare_languages_for_tesseract,
     76 )
---> 77 from unstructured.partition.pdf_image.pdf_image_utils import (
     78     annotate_layout_elements,
     79     check_element_types_to_extract,
     80     save_elements,
     81 )
     82 from unstructured.partition.pdf_image.pdfminer_processing import (
     83     merge_inferred_with_extracted_layout,
     84 )
     85 from unstructured.partition.pdf_image.pdfminer_utils import (
     86     open_pdfminer_pages_generator,
     87     rect_to_bbox,
     88 )

File /opt/conda/lib/python3.10/site-packages/unstructured/partition/pdf_image/pdf_image_utils.py:9
      6 from pathlib import PurePath
      7 from typing import TYPE_CHECKING, BinaryIO, List, Optional, Tuple, Union, cast
----> 9 import cv2
     10 import numpy as np
     11 import pdf2image

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

adi-kmt · 2024-03-10T07:30:26Z

Is there a workaround @tigerinus ?

micmarty-deepsense · 2024-03-12T15:25:49Z

@tigerinus, @mhfarahani what base image are you using? I'd like to replicate the described behavior on my side

tigerinus · 2024-03-13T05:54:22Z

@tigerinus, @mhfarahani what base image are you using? I'd like to replicate the described behavior on my side

any distro that doesn't come with the required binary libGL.so.1 should be able to reproduce this issue

In our case, it's a highly customized embedded linux (buildroot based).

micmarty-deepsense · 2024-03-13T09:55:44Z

As far as I can tell, there's a quite relevant dependency: layoutparser which relies on opencv-python.
I've seen that there is a similar request to yours: Layout-Parser/layout-parser#170

We have two options:
a) we'd need to create a PR in their package, or
b) let them know that it's important/pressuring to introduce the headless version in their repo and wait until it's fixed there

@tigerinus @adi-kmt @mhfarahani
If you need a workaround now, I'd say you should modify your Dockerfiles in the following way:

# install unstructured library as usual

# uninstall the full version, install headless
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

if opencv-python-headless is not sufficient, try with opencv-contrib-python-headless

Please let me know if that helps 🤝

FilippTrigub · 2024-03-25T20:05:21Z

Facing the same problem. The workaround works, thank you!

laurazpm · 2024-04-22T10:13:43Z

I've tried the workaround but now the error when importing partition_pdf is: ModuleNotFoundError: No module named 'cv2.typing'; 'cv2' is not a package

Robs-Git-Hub · 2024-05-07T05:47:47Z

Hitting the same issue. Is there any news on whether this could be changed to the headless version?

MthwRobinson · 2024-05-23T15:24:42Z

Thanks everyone, we're going to take a look at this.

pjaol · 2024-05-23T19:38:39Z

The workaround works, just make sure that you do your uninstall after you've done your requirements install

RUN pip install  -r requirements.txt
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

*edited as I forgot what project I was looking at

scanny · 2024-12-16T21:29:09Z

Closing as inactive.

skr3178 · 2025-03-26T20:29:41Z

Worked well.

Got error

unstructured ModuleNotFoundError: No module named 'cv2.typing'

pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76

As far as I can tell, there's a quite relevant dependency: layoutparser which relies on opencv-python. I've seen that there is a similar request to yours: Layout-Parser/layout-parser#170

We have two options: a) we'd need to create a PR in their package, or b) let them know that it's important/pressuring to introduce the headless version in their repo and wait until it's fixed there

@tigerinus @adi-kmt @mhfarahani If you need a workaround now, I'd say you should modify your Dockerfiles in the following way:
# install unstructured library as usual

# uninstall the full version, install headless
RUN pip uninstall -y opencv-python opencv-contrib-python && pip install opencv-python-headless==4.8.0.76
if opencv-python-headless is not sufficient, try with opencv-contrib-python-headless

Please let me know if that helps 🤝

tigerinus added the bug Something isn't working label Feb 4, 2024

MthwRobinson added the needs follow up label May 23, 2024

christinestraub added pdf bug Something isn't working and removed bug Something isn't working labels Jul 3, 2024

scanny closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug/opencv-python should be `headless` to avoid dependency on Xorg #2503

bug/opencv-python should be `headless` to avoid dependency on Xorg #2503

tigerinus commented Feb 4, 2024 •

edited

Loading

mhfarahani commented Feb 13, 2024 •

edited

Loading

Uh oh!

adi-kmt commented Mar 10, 2024

Uh oh!

micmarty-deepsense commented Mar 12, 2024

Uh oh!

tigerinus commented Mar 13, 2024

Uh oh!

micmarty-deepsense commented Mar 13, 2024

Uh oh!

FilippTrigub commented Mar 25, 2024

Uh oh!

laurazpm commented Apr 22, 2024

Uh oh!

Robs-Git-Hub commented May 7, 2024

Uh oh!

MthwRobinson commented May 23, 2024

Uh oh!

pjaol commented May 23, 2024 •

edited

Loading

Uh oh!

scanny commented Dec 16, 2024

Uh oh!

skr3178 commented Mar 26, 2025

Uh oh!

bug/opencv-python should be headless to avoid dependency on Xorg #2503

bug/opencv-python should be headless to avoid dependency on Xorg #2503

Comments

tigerinus commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mhfarahani commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adi-kmt commented Mar 10, 2024

Uh oh!

micmarty-deepsense commented Mar 12, 2024

Uh oh!

tigerinus commented Mar 13, 2024

Uh oh!

micmarty-deepsense commented Mar 13, 2024

Uh oh!

FilippTrigub commented Mar 25, 2024

Uh oh!

laurazpm commented Apr 22, 2024

Uh oh!

Robs-Git-Hub commented May 7, 2024

Uh oh!

MthwRobinson commented May 23, 2024

Uh oh!

pjaol commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Dec 16, 2024

Uh oh!

skr3178 commented Mar 26, 2025

Uh oh!

bug/opencv-python should be `headless` to avoid dependency on Xorg #2503

bug/opencv-python should be `headless` to avoid dependency on Xorg #2503

tigerinus commented Feb 4, 2024 •

edited

Loading

mhfarahani commented Feb 13, 2024 •

edited

Loading

pjaol commented May 23, 2024 •

edited

Loading