-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
System Info
Python 3.9.6, Langchain 0.0.334
Who can help?
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
I'm experimenting with some simple code to load a local repository to test CodeLlama, but the "exclude" in GenericLoader.from_filesystem seems not working:
`from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language
repo_path = "../../my/laravel/project/"
Load
loader = GenericLoader.from_filesystem(
repo_path,
glob="**/*",
suffixes=[".php"],
parser=LanguageParser(
parser_threshold=2000,
),
exclude=["../../my/laravel/project/vendor/", "../../my/laravel/project/node_modules/", "../../my/laravel/project/storage/", "../../my/laravel/project/public/", "../../my/laravel/project/tests/", "../../my/laravel/project/resources/"]
)
documents = loader.load()
len(documents)
`
Am I missing something obvious? I cannot find any example...with or without the exclude, the length of docs is the same (and if I just print "documents" I see files in the folders I excluded).
Expected behavior
I would expect that listing subpaths from the main path then these would be excluded.