-
Notifications
You must be signed in to change notification settings - Fork 52
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When reading a notion page, I encounter the following error:
2024-08-18 20:49:08,267 SpawnPoolWorker-2 ERROR failed to get data associated with source doc: {"processor_config": {"reprocess": false, "verbose": true, "work_dir": "/root/.cache/unstructured/ingest/pipeline", "output_dir": "notion-ingest-output", "num_processes": 2, "raise_on_error": false}, "read_config": {"download_dir": "/root/.cache/unstructured/ingest/notion/fe36aac3b4", "re_download": false, "preserve_downloads": false, "download_only": false, "max_docs": null}, "connector_config": {"access_config": {"notion_api_key": "*******"}, "page_ids": ["***"], "database_ids": ["***"], "recursive": false}, "_source_metadata": null, "_date_processed": null, "database_id": "***", "retry_strategy_config": null, "registry_name": "notion_database", "base_filename": "/****.html", "filename": "/root/.cache/unstructured/ingest/notion/***.html", "_output_filename": "notion-ingest-output/****.json", "record_locator": null, "unique_id": "/root/.cache/unstructured/ingest/notion/fe36aac3b4/***.html"}, Database.__init__() got an unexpected keyword argument 'in_trash'
Traceback (most recent call last):
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/pipeline/source.py", line 62, in run
return self.get_single(doc=doc, ingest_doc_dict=ingest_doc_dict)
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/pipeline/source.py", line 36, in get_single
doc.get_file()
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/interfaces.py", line 523, in wrapper
return func(self, *args, **kwargs)
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/utils/dep_check.py", line 45, in wrapper
return func(*args, **kwargs)
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/connector/notion/connector.py", line 222, in get_file
text_extraction = extract_database_html(
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/connector/notion/helpers.py", line 149, in extract_database_html
database: Database = client.databases.retrieve(database_id=database_id) # type: ignore
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/connector/notion/client.py", line 106, in retrieve
return Database.from_dict(data=resp)
File "/root/.pyenv/versions/bstacks/lib/python3.10/site-packages/unstructured_ingest/connector/notion/types/database.py", line 50, in from_dict
page = cls(
TypeError: Database.__init__() got an unexpected keyword argument 'in_trash'
Database ID, Page ID, and API keys are redacted here.
To Reproduce
I have followed the notion connector tutorial with the verbatim code that is there in this page: https://docs.unstructured.io/api-reference/ingest/source-connectors/notion
I am using python 3.10 on ubuntu 22.04 LTS.
Expected behavior
The notion page gets ingested and processed
Environment Info
OS version: Linux-5.15.0-25-generic-x86_64-with-glibc2.35
Python version: 3.10.12
unstructured version: 0.15.5
unstructured-inference is not installed
pytesseract is not installed
Torch is not installed
Detectron2 is not installed
PaddleOCR is not installed
Libmagic version: file-5.41
magic file from /etc/magic:/usr/share/misc/magic
Traceback (most recent call last):
File "/root/unstructured/scripts/collect_env.py", line 242, in <module>
main()
File "/root/unstructured/scripts/collect_env.py", line 234, in main
libreoffice_version = get_libreoffice_version()
File "/root/unstructured/scripts/collect_env.py", line 163, in get_libreoffice_version
result = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'libreoffice'
hardchor
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working