Skip to content

Commit 3811746

Browse files
committed
community/docs: Add missing documentation about SQLDatabaseLoader
1 parent 8b16275 commit 3811746

File tree

2 files changed

+525
-0
lines changed

2 files changed

+525
-0
lines changed
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# SQLDatabaseLoader
2+
3+
4+
## About
5+
6+
The `SQLDatabaseLoader` loads records from any database supported by
7+
[SQLAlchemy], see [SQLAlchemy dialects] for the whole list of supported
8+
SQL databases and dialects.
9+
10+
You can either use plain SQL for querying, or use an SQLAlchemy `Select`
11+
statement object, if you are using SQLAlchemy-Core or -ORM.
12+
13+
You can select which columns to place into the document, which columns
14+
to place into its metadata, which columns to use as a `source` attribute
15+
in metadata, and whether to include the result row number and/or the SQL
16+
query expression into the metadata.
17+
18+
19+
## Example
20+
21+
This example uses PostgreSQL, and the `psycopg2` driver.
22+
23+
24+
### Prerequisites
25+
26+
```shell
27+
psql postgresql://postgres@localhost/ --command "CREATE DATABASE testdrive;"
28+
psql postgresql://postgres@localhost/testdrive < ./libs/langchain/tests/integration_tests/examples/mlb_teams_2012.sql
29+
```
30+
31+
32+
### Basic loading
33+
34+
```python
35+
from langchain_community.document_loaders.sql_database import SQLDatabaseLoader
36+
from pprint import pprint
37+
38+
39+
loader = SQLDatabaseLoader(
40+
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
41+
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
42+
)
43+
docs = loader.load()
44+
```
45+
46+
```python
47+
pprint(docs)
48+
```
49+
50+
<CodeOutputBlock lang="python">
51+
52+
```
53+
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={}),
54+
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={}),
55+
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={})]
56+
```
57+
58+
</CodeOutputBlock>
59+
60+
61+
## Enriching metadata
62+
63+
Use the `include_rownum_into_metadata` and `include_query_into_metadata` options to
64+
optionally populate the `metadata` dictionary with corresponding information.
65+
66+
Having the `query` within metadata is useful when using documents loaded from
67+
database tables for chains that answer questions using their origin queries.
68+
69+
```python
70+
loader = SQLDatabaseLoader(
71+
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
72+
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
73+
include_rownum_into_metadata=True,
74+
include_query_into_metadata=True,
75+
)
76+
docs = loader.load()
77+
```
78+
79+
```python
80+
pprint(docs)
81+
```
82+
83+
<CodeOutputBlock lang="python">
84+
85+
```
86+
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
87+
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}),
88+
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'row': 2, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'})]
89+
```
90+
91+
</CodeOutputBlock>
92+
93+
94+
## Customizing metadata
95+
96+
Use the `page_content_columns`, and `metadata_columns` options to optionally populate
97+
the `metadata` dictionary with corresponding information. When `page_content_columns`
98+
is empty, all columns will be used.
99+
100+
```python
101+
import functools
102+
103+
row_to_content = functools.partial(
104+
SQLDatabaseLoader.page_content_default_mapper, column_names=["Payroll (millions)", "Wins"]
105+
)
106+
row_to_metadata = functools.partial(
107+
SQLDatabaseLoader.metadata_default_mapper, column_names=["Team"]
108+
)
109+
110+
loader = SQLDatabaseLoader(
111+
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
112+
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
113+
page_content_mapper=row_to_content,
114+
metadata_mapper=row_to_metadata,
115+
)
116+
docs = loader.load()
117+
```
118+
119+
```python
120+
pprint(docs)
121+
```
122+
123+
<CodeOutputBlock lang="python">
124+
125+
```
126+
[Document(page_content='Payroll (millions): 81.34\nWins: 98', metadata={'Team': 'Nationals'}),
127+
Document(page_content='Payroll (millions): 82.2\nWins: 97', metadata={'Team': 'Reds'}),
128+
Document(page_content='Payroll (millions): 197.96\nWins: 95', metadata={'Team': 'Yankees'})]
129+
```
130+
131+
</CodeOutputBlock>
132+
133+
134+
## Specify column(s) to identify the document source
135+
136+
Use the `source_columns` option to specify the columns to use as a "source" for the
137+
document created from each row. This is useful for identifying documents through
138+
their metadata. Typically, you may use the primary key column(s) for that purpose.
139+
140+
```python
141+
loader = SQLDatabaseLoader(
142+
query="SELECT * FROM mlb_teams_2012 LIMIT 3;",
143+
url="postgresql+psycopg2://postgres@localhost:5432/testdrive",
144+
source_columns=["Team"],
145+
)
146+
docs = loader.load()
147+
```
148+
149+
```python
150+
pprint(docs)
151+
```
152+
153+
<CodeOutputBlock lang="python">
154+
155+
```
156+
[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}),
157+
Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'}),
158+
Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'source': 'Yankees'})]
159+
```
160+
161+
</CodeOutputBlock>
162+
163+
164+
[SQLAlchemy]: https://www.sqlalchemy.org/
165+
[SQLAlchemy dialects]: https://docs.sqlalchemy.org/en/20/dialects/

0 commit comments

Comments
 (0)