Skip to content

Using recursed_filetype fails to parse recursive paths #12

@rfishermonteith

Description

@rfishermonteith

Using recursed_filetype fails to parse recursive paths (full error below).

This seems fixable by adding

"recursed_filetype": str, 
"pattern": str 

to https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/utils/misc.py#L149

However, this does create the following warning (so I may have missed something):

Cannot set key 'pattern' in a DocDict. Allowed keys are 'loading_failure,anki_tag_filter,filetype,audio_unsilence,json_dict_exclude_keys,doccheck_max_token,load_functions,recur_parent_id,source_tag,anki_template,youtube_translation,whisper_prompt,doccheck_min_token,youtube_language,file_hash,whisper_lang,path,doccheck_min_lang_prob,anki_tag_render_filter,online_media_url_regex,json_dict_template,youtube_audio_backend,anki_profile,anki_deck,audio_backend,pdf_parsers,deepgram_kwargs,anki_notetype,online_media_resourcetype_regex'
You can use the env variable WDOC_STRICT_DOCDICT to avoid this issue.

Full command:

python -m wdoc
--path="data_for_wdoc"
--filetype="recursive_paths"
--task=search
--query="How can I make wdoc run faster?"
--query_retrievers='default_multiquery'
--top_k=auto_200_500
--llms_api_bases="{'model':'http://localhost:11434','query_eval_model':'http://localhost:11434'}"
--modelname="ollama/gemma2:2b"
--query_eval_modelname="ollama/gemma2:2b"
--recursed_filetype="txt"
--pattern="*.txt"

Full error below


Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "XXX/venv/lib/python3.11/site-packages/wdoc/__main__.py", line 140, in <module>
    cli_launcher()
  File "XXX/venv/lib/python3.11/site-packages/wdoc/__main__.py", line 69, in cli_launcher
    fire.Fire(wdoc)
  File "XXX/venv/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "XXX/venv/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "XXX/venv/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "<@beartype(wdoc.wdoc.wdoc.__init__) at 0x1323ae7a0>", line 14, in __init__
  File "XXX/venv/lib/python3.11/site-packages/wdoc/utils/misc.py", line 703, in new_func
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "XXX/venv/lib/python3.11/site-packages/wdoc/wdoc.py", line 511, in __init__
    self.loaded_docs = batch_load_doc(
                       ^^^^^^^^^^^^^^^
  File "<@beartype(wdoc.utils.batch_file_loader.batch_load_doc) at 0x1019f0540>", line 110, in batch_load_doc
  File "XXX/venv/lib/python3.11/site-packages/wdoc/utils/batch_file_loader.py", line 158, in batch_load_doc
    parse_recursive_paths(
  File "<@beartype(wdoc.utils.batch_file_loader.parse_recursive_paths) at 0x12f5d2e80>", line 155, in parse_recursive_paths
TypeError: parse_recursive_paths() missing 2 required positional arguments: 'pattern' and 'recursed_filetype'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions