|
| 1 | +# SQLDatabaseLoader |
| 2 | + |
| 3 | + |
| 4 | +## About |
| 5 | + |
| 6 | +The `SQLDatabaseLoader` loads records from any database supported by |
| 7 | +[SQLAlchemy], see [SQLAlchemy dialects] for the whole list of supported |
| 8 | +SQL databases and dialects. |
| 9 | + |
| 10 | +You can either use plain SQL for querying, or use an SQLAlchemy `Select` |
| 11 | +statement object, if you are using SQLAlchemy-Core or -ORM. |
| 12 | + |
| 13 | +You can select which columns to place into the document, which columns |
| 14 | +to place into its metadata, which columns to use as a `source` attribute |
| 15 | +in metadata, and whether to include the result row number and/or the SQL |
| 16 | +query expression into the metadata. |
| 17 | + |
| 18 | + |
| 19 | +## Example |
| 20 | + |
| 21 | +This example uses PostgreSQL, and the `psycopg2` driver. |
| 22 | + |
| 23 | + |
| 24 | +### Prerequisites |
| 25 | + |
| 26 | +```shell |
| 27 | +psql postgresql://postgres@localhost/ --command "CREATE DATABASE testdrive;" |
| 28 | +psql postgresql://postgres@localhost/testdrive < ./libs/langchain/tests/integration_tests/examples/mlb_teams_2012.sql |
| 29 | +``` |
| 30 | + |
| 31 | + |
| 32 | +### Basic loading |
| 33 | + |
| 34 | +```python |
| 35 | +from langchain_community.document_loaders.sql_database import SQLDatabaseLoader |
| 36 | +from pprint import pprint |
| 37 | + |
| 38 | + |
| 39 | +loader = SQLDatabaseLoader( |
| 40 | + query="SELECT * FROM mlb_teams_2012 LIMIT 3;", |
| 41 | + url="postgresql+psycopg2://postgres@localhost:5432/testdrive", |
| 42 | +) |
| 43 | +docs = loader.load() |
| 44 | +``` |
| 45 | + |
| 46 | +```python |
| 47 | +pprint(docs) |
| 48 | +``` |
| 49 | + |
| 50 | +<CodeOutputBlock lang="python"> |
| 51 | + |
| 52 | +``` |
| 53 | +[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={}), |
| 54 | + Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={}), |
| 55 | + Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={})] |
| 56 | +``` |
| 57 | + |
| 58 | +</CodeOutputBlock> |
| 59 | + |
| 60 | + |
| 61 | +## Enriching metadata |
| 62 | + |
| 63 | +Use the `include_rownum_into_metadata` and `include_query_into_metadata` options to |
| 64 | +optionally populate the `metadata` dictionary with corresponding information. |
| 65 | + |
| 66 | +Having the `query` within metadata is useful when using documents loaded from |
| 67 | +database tables for chains that answer questions using their origin queries. |
| 68 | + |
| 69 | +```python |
| 70 | +loader = SQLDatabaseLoader( |
| 71 | + query="SELECT * FROM mlb_teams_2012 LIMIT 3;", |
| 72 | + url="postgresql+psycopg2://postgres@localhost:5432/testdrive", |
| 73 | + include_rownum_into_metadata=True, |
| 74 | + include_query_into_metadata=True, |
| 75 | +) |
| 76 | +docs = loader.load() |
| 77 | +``` |
| 78 | + |
| 79 | +```python |
| 80 | +pprint(docs) |
| 81 | +``` |
| 82 | + |
| 83 | +<CodeOutputBlock lang="python"> |
| 84 | + |
| 85 | +``` |
| 86 | +[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'row': 0, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}), |
| 87 | + Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'row': 1, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'}), |
| 88 | + Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'row': 2, 'query': 'SELECT * FROM mlb_teams_2012 LIMIT 3;'})] |
| 89 | +``` |
| 90 | + |
| 91 | +</CodeOutputBlock> |
| 92 | + |
| 93 | + |
| 94 | +## Customizing metadata |
| 95 | + |
| 96 | +Use the `page_content_columns`, and `metadata_columns` options to optionally populate |
| 97 | +the `metadata` dictionary with corresponding information. When `page_content_columns` |
| 98 | +is empty, all columns will be used. |
| 99 | + |
| 100 | +```python |
| 101 | +import functools |
| 102 | + |
| 103 | +row_to_content = functools.partial( |
| 104 | + SQLDatabaseLoader.page_content_default_mapper, column_names=["Payroll (millions)", "Wins"] |
| 105 | +) |
| 106 | +row_to_metadata = functools.partial( |
| 107 | + SQLDatabaseLoader.metadata_default_mapper, column_names=["Team"] |
| 108 | +) |
| 109 | + |
| 110 | +loader = SQLDatabaseLoader( |
| 111 | + query="SELECT * FROM mlb_teams_2012 LIMIT 3;", |
| 112 | + url="postgresql+psycopg2://postgres@localhost:5432/testdrive", |
| 113 | + page_content_mapper=row_to_content, |
| 114 | + metadata_mapper=row_to_metadata, |
| 115 | +) |
| 116 | +docs = loader.load() |
| 117 | +``` |
| 118 | + |
| 119 | +```python |
| 120 | +pprint(docs) |
| 121 | +``` |
| 122 | + |
| 123 | +<CodeOutputBlock lang="python"> |
| 124 | + |
| 125 | +``` |
| 126 | +[Document(page_content='Payroll (millions): 81.34\nWins: 98', metadata={'Team': 'Nationals'}), |
| 127 | + Document(page_content='Payroll (millions): 82.2\nWins: 97', metadata={'Team': 'Reds'}), |
| 128 | + Document(page_content='Payroll (millions): 197.96\nWins: 95', metadata={'Team': 'Yankees'})] |
| 129 | +``` |
| 130 | + |
| 131 | +</CodeOutputBlock> |
| 132 | + |
| 133 | + |
| 134 | +## Specify column(s) to identify the document source |
| 135 | + |
| 136 | +Use the `source_columns` option to specify the columns to use as a "source" for the |
| 137 | +document created from each row. This is useful for identifying documents through |
| 138 | +their metadata. Typically, you may use the primary key column(s) for that purpose. |
| 139 | + |
| 140 | +```python |
| 141 | +loader = SQLDatabaseLoader( |
| 142 | + query="SELECT * FROM mlb_teams_2012 LIMIT 3;", |
| 143 | + url="postgresql+psycopg2://postgres@localhost:5432/testdrive", |
| 144 | + source_columns=["Team"], |
| 145 | +) |
| 146 | +docs = loader.load() |
| 147 | +``` |
| 148 | + |
| 149 | +```python |
| 150 | +pprint(docs) |
| 151 | +``` |
| 152 | + |
| 153 | +<CodeOutputBlock lang="python"> |
| 154 | + |
| 155 | +``` |
| 156 | +[Document(page_content='Team: Nationals\nPayroll (millions): 81.34\nWins: 98', metadata={'source': 'Nationals'}), |
| 157 | + Document(page_content='Team: Reds\nPayroll (millions): 82.2\nWins: 97', metadata={'source': 'Reds'}), |
| 158 | + Document(page_content='Team: Yankees\nPayroll (millions): 197.96\nWins: 95', metadata={'source': 'Yankees'})] |
| 159 | +``` |
| 160 | + |
| 161 | +</CodeOutputBlock> |
| 162 | + |
| 163 | + |
| 164 | +[SQLAlchemy]: https://www.sqlalchemy.org/ |
| 165 | +[SQLAlchemy dialects]: https://docs.sqlalchemy.org/en/20/dialects/ |
0 commit comments