Skip to content

GenericLoader.from_filesystem "exclude" not working #13751

@giancarloerra

Description

@giancarloerra

System Info

Python 3.9.6, Langchain 0.0.334

Who can help?

@eyurtsev

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

I'm experimenting with some simple code to load a local repository to test CodeLlama, but the "exclude" in GenericLoader.from_filesystem seems not working:

`from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language

repo_path = "../../my/laravel/project/"

Load

loader = GenericLoader.from_filesystem(
repo_path,
glob="**/*",
suffixes=[".php"],
parser=LanguageParser(
parser_threshold=2000,
),
exclude=["../../my/laravel/project/vendor/", "../../my/laravel/project/node_modules/", "../../my/laravel/project/storage/", "../../my/laravel/project/public/", "../../my/laravel/project/tests/", "../../my/laravel/project/resources/"]
)

documents = loader.load()
len(documents)
`

Am I missing something obvious? I cannot find any example...with or without the exclude, the length of docs is the same (and if I just print "documents" I see files in the folders I excluded).

Expected behavior

I would expect that listing subpaths from the main path then these would be excluded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions