From 75af114fd99986ea940902c57d141ece1d9d2562 Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan Date: Wed, 14 May 2025 17:53:18 +0100 Subject: [PATCH 01/28] Release Notes for 4.1.0 stubbed Signed-off-by: Dj Walker-Morgan --- .../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++ .../ai-accelerator/rel_notes/index.mdx | 2 ++ .../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++ 3 files changed, 42 insertions(+) create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx new file mode 100644 index 00000000000..cad6f23262f --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -0,0 +1,23 @@ +--- +title: AI Accelerator - Pipelines 4.1.0 release notes +navTitle: Version 4.1.0 +originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +editTarget: originalFilePath +--- + +Released: 19 May 2025 + +This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. + +## Highlights + +- MOAR AI + +## Enhancements + + + +
DescriptionAddresses
Placeholder for future release note.

Soon.

+
+ + diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx index dbc87bf6dfd..a46870bd873 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx @@ -4,6 +4,7 @@ navTitle: Release notes description: Release notes for EDB Postgres AI - AI Accelerator indexCards: none navigation: + - ai-accelerator_4.1.0_rel_notes - ai-accelerator_4.0.1_rel_notes - ai-accelerator_4.0.0_rel_notes - ai-accelerator_3.0.1_rel_notes @@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera | AI Accelerator version | Release Date | |---|---| +| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 | | [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 | | [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 | | [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml new file mode 100644 index 00000000000..d7c8eebe66a --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -0,0 +1,17 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json +product: AI Accelerator - Pipelines +version: 4.1.0 +date: 19 May 2025 +intro: | + This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. +highlights: | + - MOAR AI +relnotes: +- relnote: Placeholder for future release note. + details: | + Soon. + jira: "" + addresses: "" + type: Enhancement + impact: Medium + From abed1b5edbd8a5a19d53d83956a301190dfd6b36 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:09:55 -0700 Subject: [PATCH 02/28] fix typo --- .../ai-accelerator/preparers/examples/chunk_text.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx index aa4340663cb..906f59d5d99 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx @@ -3,7 +3,7 @@ title: Preparers chunk text operation examples navTitle: Chunk text description: Examples of using preparers with the ChunkText operation in AI Accelerator. --- -These dxamples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. +These examples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. ## Primitive From 0340e4a3a05e81ca259d03e1dfa2ae3b33f10438 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:10:13 -0700 Subject: [PATCH 03/28] update primitive examples with output --- .../ai-accelerator/preparers/primitives.mdx | 71 +++++++++++++++++-- 1 file changed, 65 insertions(+), 6 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx index 2d0e2d556b5..a661a32883b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx @@ -17,20 +17,25 @@ All data preparation operations can be customized with different options. The AP Call `aidb.chunk_text()` to break text into smaller chunks. ```sql -SELECT - chunk_id, - chunk -FROM aidb.chunk_text( +SELECT * FROM aidb.chunk_text( input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.', options => '{"desired_length": 120, "max_length": 150}' ); + +__OUTPUT__ + part_id | chunk +---------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 2 | This enables processing or storage of data in manageable parts. +(3 rows) ``` - The `desired_length` size is the target size for the chunk. In most cases, this value also serves as the maximum size of the chunk. It's possible for a chunk to be returned that's less than the `desired` value, as adding the next piece of text may have made it larger than the `desired` capacity. - The `max_length` size is the maximum possible chunk size that can be generated. Setting this to a value larger than `desired` means that the chunk should be as close to `desired` as possible but can be larger if it means staying at a larger semantic level. -!!! Note -This primitive function returns each chunk with a `chunk_id` for ease of development. However, a preparer with the `ChunkText` operation outputs a single text array per input that can then be unnested as desired. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. !!! ## Parse HTML @@ -55,6 +60,22 @@ SELECT * FROM aidb.parse_html( ', options => '{"method": "StructuredPlaintext"}' -- Default ); + +__OUTPUT__ + parse_html +----------------------------------------------------------- + Hello, world! + + + + This is my first web page. + + + + It contains some bold text, some italic test, and a link.+ + + + Postgres Logo Image + + List item + + List item + + List item + + +(1 row) ``` - The `method` determines how the HTML is parsed: @@ -70,12 +91,24 @@ SELECT * FROM aidb.parse_pdf( bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'), options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default ); + +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) ``` - The `method` determines how the PDF is parsed: - `Structured` (Default) — Algorithmic text extraction. - The `allow_partial_parsing` flag determines whether to continue to parse PDFs when the parser encounters errors on one or more pages. Defaults to `true`. + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. +!!! + ## Summarize text Call `aidb.summarize_text()` to summarize text: @@ -88,6 +121,17 @@ SELECT * FROM aidb.summarize_text( input => 'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.', options => '{"model": "my_t5_model"}' ); + +__OUTPUT__ + create_model +-------------- + my_t5_model +(1 row) + + summarize_text +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(1 row) ``` - The `model` is the name of the created model to use for summarization. The model must support the `decode_text()` and `decode_text_batch()` [model primitives](../models/primitives). @@ -108,11 +152,26 @@ SELECT * FROM aidb.perform_ocr( decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'), options => '{"model": "my_paddle_ocr_model"}' ); + +__OUTPUT__ + create_model +-------------- +my_paddle_ocr_model +(1 row) + + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) ``` - The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation. !!! Tip +This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. +!!! + +!!! Note Limitations of the model still apply. For example, the [NVIDIA NIM Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/api-reference.html) model provider only supports `png` and `jpeg` image inputs. !!! From 9bd74cc87a180eb10bed25c6285d6348b8c9ce2b Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:11:31 -0700 Subject: [PATCH 04/28] fix typo --- .../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx index c148f6c5463..24e20f2c607 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx @@ -10,7 +10,7 @@ description: "Usage of preparers in AI Accelerator Pipelines." The source data preparer can come from a Postgres table or a PGFS volume. Given the different nature of the data sources and the options required for each, you use different functions to create them. !!! Note -You can customze te behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives). +You can customize the behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives). !!! ## Preparer for a table data source From 2f4d31b9b21d965bb490b80754c821f4205ada5b Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:13:02 -0700 Subject: [PATCH 05/28] update preparer usage with better column name for unnested chunk --- .../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx index 24e20f2c607..43213770488 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx @@ -41,7 +41,7 @@ SELECT aidb.create_table_preparer( source_table => 'test_source_table', source_data_column => 'content', destination_table => 'chunked_data_destination_table', - destination_data_column => 'chunks', + destination_data_column => 'chunk', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation @@ -73,7 +73,7 @@ SELECT aidb.create_volume_preparer( operation => 'ChunkText', source_volume_name => 'test_volume', destination_table => 'chunked_data_destination_table', - destination_data_column => 'chunks', + destination_data_column => 'chunk', destination_key_column => 'id', options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation ); @@ -108,7 +108,7 @@ SELECT * FROM aidb.preparers; __OUTPUT__ id | name | operation | destination_schema | destination_table | destination_key_column | destination_data_column | options | source_type | source_schema | source_table | source_data_column | source_key_column | source_volume_name ----+---------------+-----------+--------------------+--------------------------------+------------------------+-------------------------+-------------------------+-------------+---------------+-------------------+--------------------+-------------------+-------------------- - 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunks | {"desired_length": 100} | Table | public | test_source_table | content | id | + 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunk | {"desired_length": 100} | Table | public | test_source_table | content | id | (1 row) ``` From 23feea1870eeee2e6d04267ad82f07735b0f6c85 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:21:33 -0700 Subject: [PATCH 06/28] add output to chunk auto processing ex --- .../examples/chunk_text_auto_processing.mdx | 48 ++++++++++++++++++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx index 629bd2f3daf..1991b802cba 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx @@ -4,7 +4,11 @@ navTitle: Auto Processing description: Examples of using the preparer auto processing in AI Accelerator. --- -Examples of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. +Example of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. + +!!! Note +Many of the small confirmation output notices have been removed for brevity. +!!! ## Preparer with table data source @@ -22,7 +26,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1628', source_data_column => 'content', destination_table => 'chunked_data__1628', - destination_data_column => 'chunks', + destination_data_column => 'chunk', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation @@ -32,14 +36,54 @@ SELECT aidb.set_auto_preparer('preparer__1628', 'Live'); INSERT INTO source_table__1628 VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'); +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. +(3 rows) +``` + +```sql INSERT INTO source_table__1628 VALUES (2, 'This sentence should be its own chunk. This too.'); +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(5 rows) +``` + +```sql DELETE FROM source_table__1628 WHERE id = 1; +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+---------------------------------------- + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(2 rows) +``` + +```sql SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled'); ``` From 8439f24eba101f4bbd7687adacd96fd0e725d90c Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:31:23 -0700 Subject: [PATCH 07/28] add output to chunk text ex --- .../preparers/examples/chunk_text.mdx | 68 +++++++++++++++---- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx index 906f59d5d99..8d92bb4d5d7 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx @@ -11,20 +11,61 @@ These examples use preparers with the [ChunkText operation](../primitives#chunk- -- Only specify a desired length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10}'); +__OUTPUT__ + part_id | chunk +---------+----------- + 0 | This is a + 1 | simple + 2 | test + 3 | sentence. +(4 rows) +``` + +```sql -- Specify a desired length and a maximum length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10, "max_length": 15}'); +__OUTPUT__ + part_id | chunk +---------+------------- + 0 | This is a + 1 | simple test + 2 | sentence. +(3 rows) +``` + +```sql -- Named parameters -SELECT - chunk_id, - chunk -FROM aidb.chunk_text( +SELECT * FROM aidb.chunk_text( input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.', - options => '{"desired_length": 10}' + options => '{"desired_length": 40}' ); +__OUTPUT__ + part_id | chunk +---------+---------------------------------------- + 0 | This is a significantly longer text + 1 | example that might require splitting + 2 | into smaller chunks. + 3 | The purpose of this function is to + 4 | partition text data into segments of a + 5 | specified maximum length, for example, + 6 | this sentence 145 is characters. + 7 | This enables processing or storage of + 8 | data in manageable parts. +(9 rows) +``` + +```sql -- Semantic chunking to split into the largest continuous semantic chunk that fits in the max_length SELECT * FROM aidb.chunk_text('This sentence should be its own chunk. This too.', '{"desired_length": 1, "max_length": 1000}'); + +__OUTPUT__ + part_id | chunk +---------+---------------------------------------- + 0 | This sentence should be its own chunk. + 1 | This too. +(2 rows) ``` ## Preparer with table data source @@ -56,12 +97,13 @@ SELECT aidb.bulk_data_preparation('preparer__1628'); SELECT * FROM chunked_data__1628; --- Unnest chunk text arrays -SELECT - id, - chunk_number, - chunk -FROM - chunked_data__1628, - unnest(chunks) WITH ORDINALITY AS chunk_list(chunk, chunk_number); +__OUTPUT__ + id | part_id | unique_id | chunks +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(5 rows) ``` From f008c1b5b510d2a4dec1f7ab0a32e4f43d082538 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:33:45 -0700 Subject: [PATCH 08/28] add output to parse html ex --- .../preparers/examples/parse_html.mdx | 56 +++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx index bf611ad5515..19e82bd8e6d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx @@ -14,6 +14,17 @@ SELECT * FROM aidb.parse_html( '

Hello World Heading

Hello World paragraph

' ); +__OUTPUT__ + parse_html +----------------------- + Hello World Heading + + + + Hello World paragraph+ + +(1 row) +``` + +```sql -- Parse Hello World HTML to plaintext SELECT * FROM aidb.parse_html( html => @@ -33,6 +44,24 @@ SELECT * FROM aidb.parse_html( options => '{"method": "StructuredPlaintext"}' -- Default ); +__OUTPUT__ + parse_html +----------------------------------------------------------- + Hello, world! + + + + This is my first web page. + + + + It contains some bold text, some italic test, and a link.+ + + + Postgres Logo Image + + List item + + List item + + List item + + +(1 row) +``` + +```sql -- Parse Hello World HTML to markdown-esque text that retains some syntactical context SELECT * FROM aidb.parse_html( html => @@ -51,6 +80,22 @@ SELECT * FROM aidb.parse_html( ', options => '{"method": "StructuredMarkdown"}' ); + +__OUTPUT__ + parse_html +--------------------------------------------------------------------------------------- + # Hello, world! + + + + This is my first web page. + + + + It contains some **bold text**, some *italic test*, and a [link](https://google.com).+ + + + ![Postgres Logo Image](postgres_logo.png) + + 1. List item + + 2. List item + + 3. List item + + +(1 row) ``` ## Preparer with table data source @@ -81,4 +126,15 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__2772'); SELECT * FROM destination_table__2772; + +__OUTPUT__ + id | parsed_html +----+------------------------------------------------------- + 1 | Hello World Heading + + | + + | Hello World paragraph + + | + 2 | This is some bold text, some italic test, and a link.+ + | +(2 rows) ``` From 57a6d91c0c49373118d7966471fe70b9f530e509 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:34:38 -0700 Subject: [PATCH 09/28] add output to parse pdf ex --- .../preparers/examples/parse_pdf.mdx | 32 ++++++++++++++----- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx index 8f2f3f99328..0c393de126a 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx @@ -15,11 +15,27 @@ SELECT * FROM aidb.parse_pdf( decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex') ); +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) +``` + +```sql -- Manually specify the default options SELECT * FROM aidb.parse_pdf( bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'), options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default ); + +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) ``` ## Preparer with table data source @@ -51,12 +67,12 @@ SELECT aidb.bulk_data_preparation('preparer__6124'); SELECT * FROM destination_table__6124; --- Unnest chunk text arrays -SELECT - id, - page_number, - parsed_text -FROM - destination_table__6124, - unnest(parsed_pdf) WITH ORDINALITY AS pdf_pages(parsed_text, page_number); +__OUTPUT__ + id | part_id | unique_id | parsed_pdf +----+---------+-----------+-------------- + 1 | 0 | 1.part.0 | Hello World!+ + | | | + 2 | 0 | 2.part.0 | Hello World!+ + | | | +(2 rows) ``` From 56f1f9a766e6fe3af126f3bab6786323fb56bb02 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:36:09 -0700 Subject: [PATCH 10/28] add output to ocr ex --- .../preparers/examples/perform_ocr.mdx | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx index 0985759af7c..9f2fbcd5159 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx @@ -28,11 +28,25 @@ SELECT * FROM aidb.perform_ocr( options => '{"model": "my_paddle_ocr_model"}' ); +__OUTPUT__ + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) +``` + +```sql -- Positional arguments SELECT * FROM aidb.perform_ocr( decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'), '{"model": "my_paddle_ocr_model"}' ); + +__OUTPUT__ + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) ``` ## Preparer with table data source @@ -62,6 +76,38 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__1527'); SELECT * FROM ocr_data__1527; + +__OUTPUT__ + id | part_id | unique_id | parsed__text +----+---------+-----------+-------------------------------------------- + 1 | 0 | 1.part.0 | Trunch Parish Council + 1 | 1 | 1.part.1 | BANK RECONCILIATION AS AT 31STOCTOBER 2019 + 1 | 2 | 1.part.2 | Account: + 1 | 3 | 1.part.3 | 14,389.43 + 1 | 4 | 1.part.4 | BANK STATEMENT BALANCE 3OTH SEPTEMBER 2019 + 1 | 5 | 1.part.5 | 83.60 + 1 | 6 | 1.part.6 | PREVIOUS OUTSTANDING CHEQUES + 1 | 7 | 1.part.7 | 14,305.83 + 1 | 8 | 1.part.8 | CASHBOOK BALANCE 31ST OCTOBER 2019 + 1 | 9 | 1.part.9 | ADD CHEQUES OUTSTANDING: + 1 | 10 | 1.part.10 | * + 1 | 11 | 1.part.11 | 101719 + 1 | 12 | 1.part.12 | 83.60* + 1 | 13 | 1.part.13 | * + 1 | 14 | 1.part.14 | * + 1 | 15 | 1.part.15 | 83.60 + 1 | 16 | 1.part.16 | OUTSTANDING CHEQUES + 1 | 17 | 1.part.17 | 9,148.00 + 1 | 18 | 1.part.18 | RECEIPTS + 1 | 19 | 1.part.19 | 4,309.94 + 1 | 20 | 1.part.20 | PAYMENTS + 1 | 21 | 1.part.21 | 19,227.49 + 1 | 22 | 1.part.22 | BALANCE 31STOCTOBER2019 + 1 | 23 | 1.part.23 | 19,227.49* + 1 | 24 | 1.part.24 | BALANCE AS PER BANK STATEMENT + 1 | 25 | 1.part.25 | 0.00 + 1 | 26 | 1.part.26 | DIFFERENCE +(27 rows) ``` ## Model compatibility From 22f9cf9fc826c366f38cadd696123d00cdf475da Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:38:11 -0700 Subject: [PATCH 11/28] add output to summarize ex --- .../preparers/examples/summarize_text.mdx | 25 +++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx index 55e662801b3..c8039ebbe45 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx @@ -24,11 +24,25 @@ SELECT * FROM aidb.summarize_text( options => '{"model": "model__1952"}' ); +__OUTPUT__ + summarize_text +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise . +(1 row) +``` + +```sql -- Positional arguments SELECT * FROM aidb.summarize_text( 'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.', '{"model": "model__1952"}' ); + +__OUTPUT__ + summarize_text +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(1 row) ``` ## Preparer with table data source @@ -50,7 +64,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1952', source_data_column => 'content', destination_table => 'summarized_data__1952', - destination_data_column => 'summaries', + destination_data_column => 'summary', source_key_column => 'id', destination_key_column => 'id', options => '{"model": "model__1952"}'::JSONB -- Configuration for the SummarizeText operation @@ -59,6 +73,13 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__1952'); SELECT * FROM summarized_data__1952; + +__OUTPUT__ + id | summary +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise . + 2 | the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(2 rows) ``` ## Model compatibility @@ -88,7 +109,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1952', source_data_column => 'content', destination_table => 'summarized_data__1952', - destination_data_column => 'summaries', + destination_data_column => 'summary', options => '{"model": "bert_model"}'::JSONB -- Incompatible model ); __OUTPUT__ From 7fc300944e06399040087a345ff0d61a2f4803d1 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 21:57:09 -0700 Subject: [PATCH 12/28] add tips to reference new Unnesting concept --- .../edb-postgres-ai/ai-accelerator/preparers/concepts.mdx | 6 ++++++ .../ai-accelerator/preparers/examples/chunk_text.mdx | 4 ++++ .../preparers/examples/chunk_text_auto_processing.mdx | 4 ++++ .../ai-accelerator/preparers/examples/parse_pdf.mdx | 4 ++++ .../ai-accelerator/preparers/examples/perform_ocr.mdx | 4 ++++ .../ai-accelerator/preparers/primitives.mdx | 7 ++++--- .../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 4 ++++ 7 files changed, 30 insertions(+), 3 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx index f5667266fdc..6e38969954b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx @@ -34,6 +34,12 @@ Bulk data preparation performs a preparer's associated operation for all of the Bulk data preparation does not delete existing destination data unless it conflicts with newly generated data. It is recommended to configure separate destination tables for each preparer. !!! +## Unnesting + +Some Preparer [Primitives](./primitives) transform the shape of the data they are given. For example, `ChunkText` receives one text block and produces one or more text blocks. Rather than return nested collections of results, these Primitives automatically unnest (or "explode") their output, using a new `part_id` column to track the additional dimension. + +You can see this in action in [Primitives](./primitives) and in the applicable [examples](./examples). + ## Consistency with source data To ensure correct and consistent data, the prepared destination data must be in sync with the source data. In the case of the table data source, you can enable preparer auto processing to inform the preparer pipeline about changes to the source data. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx index 8d92bb4d5d7..0c617d70705 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx @@ -5,6 +5,10 @@ description: Examples of using preparers with the ChunkText operation in AI Acce --- These examples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Primitive ```sql diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx index 1991b802cba..d40f901bc12 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx @@ -10,6 +10,10 @@ Example of using preparer auto processing with the [ChunkText operation](../prim Many of the small confirmation output notices have been removed for brevity. !!! +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Preparer with table data source ```sql diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx index 0c393de126a..55b3c5bf616 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx @@ -6,6 +6,10 @@ description: Examples of using preparers with the ParsePdf operation in AI Accel These examples use preparers with the [ParsePdf operation](../primitives#parse-pdf) in AI Accelerator. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Primitive ```sql diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx index 9f2fbcd5159..befcbb7e777 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx @@ -6,6 +6,10 @@ description: Examples of using preparers with the PerformOcr operation in AI Acc Examples of using preparers with the [PerformOcr operation](../primitives#summarize-text) in AI Accelerator. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Model creation (required) This step is required for primitive single execution and for preparer bulk execution. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx index a661a32883b..62771150bbf 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx @@ -35,7 +35,7 @@ __OUTPUT__ - The `max_length` size is the maximum possible chunk size that can be generated. Setting this to a value larger than `desired` means that the chunk should be as close to `desired` as possible but can be larger if it means staying at a larger semantic level. !!! Tip -This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. !!! ## Parse HTML @@ -104,9 +104,10 @@ __OUTPUT__ - `Structured` (Default) — Algorithmic text extraction. - The `allow_partial_parsing` flag determines whether to continue to parse PDFs when the parser encounters errors on one or more pages. Defaults to `true`. +- The `part_id` column in the output references the index of the page from which the text was extracted. !!! Tip -This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. !!! ## Summarize text @@ -168,7 +169,7 @@ my_paddle_ocr_model - The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation. !!! Tip -This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension. +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. !!! !!! Note diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx index 43213770488..264a762892c 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx @@ -32,6 +32,10 @@ aidb.create_table_preparer( ) ``` +!!! Tip +The `source_key_column` must be a unique key for the source data. If the data source is the output of a Preparer that [transforms the data shape](./concepts#unnesting) with a `part_id` column, make sure to use the new `unique_id` column. +!!! + ### Example: Creating a preparer ``` sql From 334ad987407ff0cef66ad1acd1b67babeadccf77 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 22:03:03 -0700 Subject: [PATCH 13/28] update source_key_column reference description to include uniqueness recommendation --- .../ai-accelerator/reference/knowledge_bases.mdx | 2 +- .../edb-postgres-ai/ai-accelerator/reference/preparers.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx index 1588f8e5126..ae5dc00975d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx @@ -135,7 +135,7 @@ Creates a knowledge base for a given table. | source_table | regclass | Required | Name of the table to use as source. | | source_data_column | TEXT | Required | Column name in source table to use. | | source_data_format | [aidb.PipelineDataFormat](#aidbpipelinedataformat) | Required | Format of data in that column ("Text", "Image", "PDF"). | -| source_key_column | TEXT | 'id' | Column to use as key to reference the rows. | +| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. | | vector_table | TEXT | NULL | | | vector_data_column | TEXT | 'embeddings' | | | vector_key_column | TEXT | 'id' | | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx index 84144aad956..4835af8c5f6 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx @@ -58,7 +58,7 @@ Creates a preparer with a source data table. | source_data_column | TEXT | Required | Column in the source table containing the raw data | | destination_table | TEXT | Required | Name of the destination table | | destination_data_column | TEXT | Required | Column in the destination table for processed data | -| source_key_column | TEXT | 'id' | Column to use as key to reference the rows | +| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. | | destination_key_column | TEXT | 'id' | Key column in the destination table that references the `source_key_column` | | options | JSONB | '{}'::JSONB | Configuration options for the data preparation operation. Uses the same API as the [data preparation primitives](../preparers/primitives.mdx). | From 7470073f01c399a06db90e0c067d28c3f3102340 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 22:32:58 -0700 Subject: [PATCH 14/28] init chained preparers ex --- .../preparers/examples/chained_preparers.mdx | 121 ++++++++++++++++++ .../examples/chunk_text_auto_processing.mdx | 4 - 2 files changed, 121 insertions(+), 4 deletions(-) create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx new file mode 100644 index 00000000000..dab273a4fc8 --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx @@ -0,0 +1,121 @@ +--- +title: Preparer Chaining Example +navTitle: Preparer Chaining +description: Examples of using the preparer auto processing in AI Accelerator. +--- + +Example of chaining multiple preparers together with auto processing using the [ChunkText](../primitives#chunk-text) and [SummarizeText](../primitives#summarize-text) operations in AI Accelerator. + +## Create the first Preparer to chunk text + +```sql +-- Create source test table +CREATE TABLE source_table__1321 +( + id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, + content TEXT NOT NULL +); + +SELECT aidb.create_table_preparer( + name => 'chunking_preparer__1321', + operation => 'ChunkText', + source_table => 'source_table__1321', + source_key_column => 'id', + source_data_column => 'content', + destination_table => 'chunked_data__1321', + destination_data_column => 'chunk', + destination_key_column => 'id', + options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation +); +``` + +## Create the second Preparer to summarize the chunked text + +```sql +-- Create the model. It must support the decode_text and decode_text_batch operations. +SELECT aidb.create_model('model__1321', 't5_local'); + +SELECT aidb.create_table_preparer( + name => 'summarizing_preparer__1321', + operation => 'SummarizeText', + source_table => 'chunked_data__1321', -- Reference the output from the ChunkText preparer + source_key_column => 'unique_id', -- Reference the unique column from the output of the ChunkText preparer + source_data_column => 'chunk', -- Reference the output from the ChunkText preparer + destination_table => 'summarized_data__1321', + destination_data_column => 'summary', + destination_key_column => 'chunk_unique_id', + options => '{"model": "model__1321"}'::JSONB -- Configuration for the SummarizeText operation +); +``` + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + +## Set both Preparers to Live automatic processing + +```sql +SELECT aidb.set_auto_preparer('chunking_preparer__1321', 'Live'); +SELECT aidb.set_auto_preparer('summarizing_preparer__1321', 'Live'); +``` + +## Insert data for processing + +Now, when we insert data into the source data table, we see processed results flowing automatically... + +```sql +INSERT INTO source_table__1321 +VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'); +``` + +Chunks calculated automatically: + +```sql +SELECT * FROM chunked_data__1321; + +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. +(3 rows) +``` + +Summaries of the chunks calculated automatically: + +```sql +SELECT * FROM summarized_data__1321; + +__OUTPUT__ + chunk_unique_id | summary +-----------------+------------------------------------------------------------------------------------------------------ + 1.part.0 | text example might require splitting into smaller chunks . + 1.part.1 | the purpose of this function is to partition text data into segments of a specified maximum length . + 1.part.2 | enables processing or storage of data in manageable parts . +(3 rows) +``` + +The same automatic flow of logic occurs for deletions: + +```sql +DELETE FROM source_table__1321 WHERE id = 1; +``` + +```sql +SELECT * FROM chunked_data__1321; + +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+------- +(0 rows) +``` + +```sql +SELECT * FROM summarized_data__1321; + +__OUTPUT__ + chunk_unique_id | summary +-----------------+--------- +(0 rows) +``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx index d40f901bc12..c64bb9ef63e 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx @@ -6,10 +6,6 @@ description: Examples of using the preparer auto processing in AI Accelerator. Example of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. -!!! Note -Many of the small confirmation output notices have been removed for brevity. -!!! - !!! Tip This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. !!! From d19c476e3e3b5aa3f0d997edc3e42afd3faffb91 Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 22:33:05 -0700 Subject: [PATCH 15/28] init rel notes --- .../rel_notes/src/rel_notes_4.1.0.yml | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml index d7c8eebe66a..eadb80e9e45 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -5,13 +5,21 @@ date: 19 May 2025 intro: | This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. highlights: | - - MOAR AI + - Automatic unnesting of Preparer results for operations that transform the shape of data. relnotes: -- relnote: Placeholder for future release note. +- relnote: Automatic unnesting of Preparer results for operations that transform the shape of data. details: | - Soon. + The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. + This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. + Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to unqiuely identify the combination of the source key and part_id. jira: "" addresses: "" type: Enhancement - impact: Medium - + impact: High +- relnote: Change output column for `chunk_text()` primitive function + details: | + The enumeration column returned by the `chunk_text()` primitive function is now `part_id` to match the other Preparer primitives/operations. + jira: "" + addresses: "" + type: Enhancement + impact: Low From f726fe8819478ecd7419424be278093298e3afc1 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 May 2025 05:35:57 +0000 Subject: [PATCH 16/28] update generated release notes --- .../rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx index cad6f23262f..2313ee61aa2 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -11,12 +11,16 @@ This is a minor release that includes a few bug fixes and enhancements to the kn ## Highlights -- MOAR AI +- Automatic unnesting of Preparer results for operations that transform the shape of data. ## Enhancements - +
DescriptionAddresses
Placeholder for future release note.

Soon.

+
Automatic unnesting of Preparer results for operations that transform the shape of data.

The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. +This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. +Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.

+
Change output column for chunk_text() primitive function

The enumeration column returned by the chunk_text() primitive function is now part_id to match the other Preparer primitives/operations.

From 8bca94c03ef78c07c41bbac33d345e9e55cca08a Mon Sep 17 00:00:00 2001 From: Noah Baculi Date: Wed, 14 May 2025 22:37:38 -0700 Subject: [PATCH 17/28] refine rel note --- .../ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml index eadb80e9e45..6dc6e54c004 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -16,9 +16,10 @@ relnotes: addresses: "" type: Enhancement impact: High + - relnote: Change output column for `chunk_text()` primitive function details: | - The enumeration column returned by the `chunk_text()` primitive function is now `part_id` to match the other Preparer primitives/operations. + The enumeration column returned by the `chunk_text()` primitive function is now `part_id` instead of `chunk_id` to match the other Preparer primitives/operations. jira: "" addresses: "" type: Enhancement From cef7ae0a43ef683dd9e06eb6d9f02fdf050035cc Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 15 May 2025 05:39:02 +0000 Subject: [PATCH 18/28] update generated release notes --- .../ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx index 2313ee61aa2..00f6439f16d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -20,7 +20,7 @@ This is a minor release that includes a few bug fixes and enhancements to the kn This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.

-
Change output column for chunk_text() primitive function

The enumeration column returned by the chunk_text() primitive function is now part_id to match the other Preparer primitives/operations.

+
Change output column for chunk_text() primitive function

The enumeration column returned by the chunk_text() primitive function is now part_id instead of chunk_id to match the other Preparer primitives/operations.

From a8aa0f4019d90537e2019b664be2e9f1e2b7684d Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan Date: Wed, 14 May 2025 17:53:18 +0100 Subject: [PATCH 19/28] Release Notes for 4.1.0 stubbed Signed-off-by: Dj Walker-Morgan --- .../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++ .../ai-accelerator/rel_notes/index.mdx | 2 ++ .../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++ 3 files changed, 42 insertions(+) create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx new file mode 100644 index 00000000000..cad6f23262f --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -0,0 +1,23 @@ +--- +title: AI Accelerator - Pipelines 4.1.0 release notes +navTitle: Version 4.1.0 +originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +editTarget: originalFilePath +--- + +Released: 19 May 2025 + +This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. + +## Highlights + +- MOAR AI + +## Enhancements + + + +
DescriptionAddresses
Placeholder for future release note.

Soon.

+
+ + diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx index dbc87bf6dfd..a46870bd873 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx @@ -4,6 +4,7 @@ navTitle: Release notes description: Release notes for EDB Postgres AI - AI Accelerator indexCards: none navigation: + - ai-accelerator_4.1.0_rel_notes - ai-accelerator_4.0.1_rel_notes - ai-accelerator_4.0.0_rel_notes - ai-accelerator_3.0.1_rel_notes @@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera | AI Accelerator version | Release Date | |---|---| +| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 | | [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 | | [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 | | [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml new file mode 100644 index 00000000000..d7c8eebe66a --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -0,0 +1,17 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json +product: AI Accelerator - Pipelines +version: 4.1.0 +date: 19 May 2025 +intro: | + This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. +highlights: | + - MOAR AI +relnotes: +- relnote: Placeholder for future release note. + details: | + Soon. + jira: "" + addresses: "" + type: Enhancement + impact: Medium + From e809ecf8199c4686da33f268ba387a3f31a1bd2d Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan Date: Thu, 15 May 2025 11:26:07 +0100 Subject: [PATCH 20/28] Remove New from front page Signed-off-by: Dj Walker-Morgan --- src/pages/index.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/pages/index.js b/src/pages/index.js index 921ba62eab1..6232e6907ab 100644 --- a/src/pages/index.js +++ b/src/pages/index.js @@ -282,7 +282,7 @@ const Page = () => { Get Started with Pipelines - New: AI Accelerator Preparers + AI Accelerator Preparers PGvector From 75e81762f2014e70cb04a15542c949e5f13f5d53 Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan Date: Wed, 14 May 2025 17:53:18 +0100 Subject: [PATCH 21/28] Release Notes for 4.1.0 stubbed Signed-off-by: Dj Walker-Morgan --- .../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++ .../ai-accelerator/rel_notes/index.mdx | 2 ++ .../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++ 3 files changed, 42 insertions(+) create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx new file mode 100644 index 00000000000..cad6f23262f --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -0,0 +1,23 @@ +--- +title: AI Accelerator - Pipelines 4.1.0 release notes +navTitle: Version 4.1.0 +originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +editTarget: originalFilePath +--- + +Released: 19 May 2025 + +This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. + +## Highlights + +- MOAR AI + +## Enhancements + + + +
DescriptionAddresses
Placeholder for future release note.

Soon.

+
+ + diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx index dbc87bf6dfd..a46870bd873 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx @@ -4,6 +4,7 @@ navTitle: Release notes description: Release notes for EDB Postgres AI - AI Accelerator indexCards: none navigation: + - ai-accelerator_4.1.0_rel_notes - ai-accelerator_4.0.1_rel_notes - ai-accelerator_4.0.0_rel_notes - ai-accelerator_3.0.1_rel_notes @@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera | AI Accelerator version | Release Date | |---|---| +| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 | | [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 | | [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 | | [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml new file mode 100644 index 00000000000..d7c8eebe66a --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -0,0 +1,17 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json +product: AI Accelerator - Pipelines +version: 4.1.0 +date: 19 May 2025 +intro: | + This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. +highlights: | + - MOAR AI +relnotes: +- relnote: Placeholder for future release note. + details: | + Soon. + jira: "" + addresses: "" + type: Enhancement + impact: Medium + From 5159c178528a98dd05ee6c75521fa05b0edd60ba Mon Sep 17 00:00:00 2001 From: Dj Walker-Morgan Date: Thu, 15 May 2025 11:26:07 +0100 Subject: [PATCH 22/28] Remove New from front page Signed-off-by: Dj Walker-Morgan --- src/pages/index.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/pages/index.js b/src/pages/index.js index 921ba62eab1..6232e6907ab 100644 --- a/src/pages/index.js +++ b/src/pages/index.js @@ -282,7 +282,7 @@ const Page = () => { Get Started with Pipelines
- New: AI Accelerator Preparers + AI Accelerator Preparers PGvector From 3540c527582adae10202aedaa3c767f8e6e3da9e Mon Sep 17 00:00:00 2001 From: Tim Waizenegger Date: Thu, 15 May 2025 13:12:56 +0200 Subject: [PATCH 23/28] Notes about model batch processing --- .../models/supported-models/embeddings.mdx | 34 +++++++++++++++---- .../rel_notes/src/rel_notes_4.1.0.yml | 17 +++++++--- 2 files changed, 41 insertions(+), 10 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx index 9e9e14cf5a4..3679a491a6b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx @@ -42,7 +42,7 @@ Based on the name of the model, the model provider sets defaults accordingly: ## Creating the default with OpenAI model ```sql -SELECT aidb.create_model('my_openai_embeddings', +SELECT aidb.create_model('my_openai_embeddings', 'openai_embeddings', credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB); ``` @@ -58,7 +58,7 @@ SELECT aidb.create_model( 'my_openai_model', 'openai_embeddings', '{"model": "text-embedding-3-small"}'::JSONB, - '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB + '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB ); ``` @@ -69,12 +69,34 @@ Because this example is passing the configuration options and the credentials, u The following configuration settings are available for OpenAI models: * `model` — The OpenAI model to use. -* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL. - * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`. +* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL. + * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`. * If `nim_completions` is the `model`, `url` defaults to `https://integrate.api.nvidia.com/v1/chat/completions`. * `max_concurrent_requests` — The maximum number of concurrent requests to make to the OpenAI model. The default is `25`. - -## Model credentials +* `max_batch_size` — The maximum number of records to send to the model in a single request. The default is `50.000`. + +### Batch and parallel processing +The model providers for `embeddings`, `openai_embeddings`, and `nim_embeddings` support sending batch requests as well as concurrent requests. +The two settings `max_concurrent_requests` and `max_batch_size` control this behavior. When a model provider receives a set of records (E.g., from a knowledge base pipeline) + the following happens: +* Assuming the knowledge base pipeline is configured with batch size 10.000. +* And the model provider is configured with `max_batch_size=1000` and `max_concurrent_requests=5`. +* Then, the provider will collect up to 1000 records and send them in a single request to the model. +* And it will send 5 such large requests concurrently, until no more input records are left. +* So in this example, the provider needs to send/receive 10 batches in total. + * After sending the first 5, it waits for the responses to return. + * Once a response is received, another request can be sent. + * This means the provider won't wait for all 5 to return before sending off the next 5. Instead, it always keeps up to 5 requests in flight. + +!!! Note +The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on +the hardware and infrastructure. + +We recommend testing different combinations by using a knowledge base pipeline. See our model performance tuning guide here: TODO +!!! + + +### Model credentials The following credentials may be required by the service providing these models. Note: `api_key` and `basic_auth` are exclusive. Only one of these two options can be used. * `api_key` — The API key to use for Bearer Token authentication. The api_key will be sent in a header field as `Authorization: Bearer `. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml index 109b75ade5a..be37e6ba9ff 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -3,25 +3,34 @@ product: AI Accelerator - Pipelines version: 4.1.0 date: 19 May 2025 intro: | - This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. + This is a minor release that includes enhancements to the preparer pipeline and the model API providers. highlights: | - Automatic unnesting of Preparer results for operations that transform the shape of data. + - Batch processing for embeddings with external models. relnotes: - relnote: Automatic unnesting of Preparer results for operations that transform the shape of data. details: | The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to unqiuely identify the combination of the source key and part_id. - jira: "" + jira: "AID-410" addresses: "" type: Enhancement impact: High -- relnote: Change output column for `chunk_text()` primitive function +- relnote: Change output column for `chunk_text()` primitive function. details: | The enumeration column returned by the `chunk_text()` primitive function is now `part_id` instead of `chunk_id` to match the other Preparer primitives/operations. - jira: "" + jira: "AID-410" addresses: "" type: Enhancement impact: Low +- relnote: Batch processing for embeddings with external models. + details: | + The external model providers `embeddings`, `openai_embeddings`, and `nim_embeddings` can now send a batch of inputs in a single request, rather than multiple concurrent requests. + This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled. + jira: "AID-419" + addresses: "" + type: Enhancement + impact: Medium From 2d3d86760ce5dc01cbeb021bf1f7cf2965cf1f5c Mon Sep 17 00:00:00 2001 From: Tim Waizenegger Date: Thu, 15 May 2025 15:46:26 +0200 Subject: [PATCH 24/28] performance tuning guide --- .../capabilities/auto-processing.mdx | 6 +- .../knowledge_base/performance_tuning.mdx | 117 ++++++++++++++++++ .../models/supported-models/embeddings.mdx | 3 +- 3 files changed, 124 insertions(+), 2 deletions(-) create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx index 920ccae199d..d5ec8c3f827 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx @@ -123,7 +123,11 @@ As well as for existing pipelines: - With [`aidb.set_auto_knowledge_base`](../reference/knowledge_bases#aidbset_auto_knowledge_base) ## Batch processing -In Background and Disabled modes, (auto) processing happens in batches of configurable size. Within each batch, +In Background and Disabled modes, (auto) processing happens in batches of configurable size. The pipeline will process all source records in batches. +All records within each batch are processed in parallel wherever possible. This means pipeline steps like data retrieval, embeddings computation, and storing embeddings will run as parallel operations. +E.g., when using a table as a data source, a batch of input records will be retrieved with a single query. With a volume source, concurrent requests will be used to retrieve a batch of records. + +Our [knowledge base pipeline performance tuning guide](knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput. ## Change detection AIDB auto-processing is designed around change detection mechanisms for table and volume data sources. This allows it to only diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx new file mode 100644 index 00000000000..3d338d14f8e --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx @@ -0,0 +1,117 @@ +--- +title: "Pipelines knowledge base performance tuning" +navTitle: "Performance tuning" +deepToC: true +description: "How to tune the performance of knowledge base pipelines." +--- + + +## Background +The performance (i.e., throughput of embeddings per second) can be optimized by changing pipeline and model settings. +This guide explains the relevant settings and shows how to tune them. + +Knowledge base piplines process collections of individual records (rows in a table or objects in a volume). Rather than processing each record individually and sequentially, or processing all of them concurrently, +AIDB offers batch processing. All the batches get processed sequentially, one after the other. Within each batch, records get processed concurrently wherever possible. + +- [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have +- Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution. + + +## Testing and tuning performance +We will first set up test data and a knowledge base pipeline, then measure and tune the batch size. + +### 1) Create a table and insert test data +The actual data content does not matter for this test, so we can generate data: +```sql +CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL); + +INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hello world'; +``` + + +### 2) Create a knowledge base pipeline +The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use. +```sql +SELECT aidb.create_table_knowledge_base( + name => 'perf_test', + model_name => 'my_model', -- use the model you want to optimize for + source_table => 'test_data_10k', + source_data_column => 'msg', + source_data_format => 'Text', + auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime + batch_size => 100 -- this is the paramter we will tune during this test +); +__OUTPUT__ +INFO: using vector table: public.perf_test_vector +NOTICE: index "vdx_perf_test_vector" does not exist, skipping +NOTICE: auto-processing is set to "Disabled". Manually run "SELECT aidb.bulk_embedding('perf_test');" to compute embeddings. + create_table_knowledge_base +----------------------------- + perf_test +(1 row) +``` + +### 3) Run the pipeline, measure the performance +We use `psql` in this test; the `\timing on` command is a feature in psql. If you use a different interface, check how it can display timing information. + +```sql +\timing on +__OUTPUT__ +Timing is on. +``` + +Now run the pipeline: +```sql +SELECT aidb.bulk_embedding('perf_test'); +__OUTPUT__ +INFO: perf_test: (re)setting state table to process all data... +INFO: perf_test: Starting... Batch size 100, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 0 +INFO: perf_test: Batch iteration finished, unprocessed rows: 9900, count(source records): 10000, count(embeddings): 100 +INFO: perf_test: Batch iteration finished, unprocessed rows: 9800, count(source records): 10000, count(embeddings): 200 +... +INFO: perf_test: Batch iteration finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 +INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 + bulk_embedding +---------------- + +(1 row) + +Time: 207161,174 ms (03:27,161) +``` + + + +### 4) Tune the batch size +You can use this call to adjust the batch size of the pipeline. We increase by 10x to 1000 records: +```sql +SELECT aidb.set_auto_knowledge_base('perf_test', 'Disabled', batch_size=>1000); +``` + +Run the pipeline again. + +!!! Note +When using a Postgres table as the source, with auto-processing disabled, AIDB has no means to detect changes in the source data. So each bulk_embedding call has to re-process everything. + +This is convenient for performance testing. + +If you want to measure performance with a volumes source, you should delete and re-create the knowledge base between each test. AIDB is able to detect changes on volumes even with auto-procesing disabled. + +!!! +```sql +SELECT aidb.bulk_embedding('perf_test'); +__OUTPUT__ +INFO: perf_test: (re)setting state table to process all data... +INFO: perf_test: Starting... Batch size 1000, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 10000 +... +INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 + bulk_embedding +---------------- + +(1 row) + +Time: 154276,486 ms (02:34,276) +``` + + +## Conclusion +In this test, the pipeline took 02:34 min with batch size 1000 and 03:27 min with size 100. You can continue testing larger sizes until performance no longer improves, or even declines. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx index 3679a491a6b..1bdd2a99a32 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx @@ -92,7 +92,8 @@ The two settings `max_concurrent_requests` and `max_batch_size` control this beh The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on the hardware and infrastructure. -We recommend testing different combinations by using a knowledge base pipeline. See our model performance tuning guide here: TODO +We recommend leaving the defaults in place and [tuning the performance via the knowledge base pipeline batch size.](../../knowledge_base/performance_tuning) +The default `max_batch_size` of 50.000 is intentionally high to allow the pipeline to control the actual size of the batches. !!! From 5ed14e5a90434cf2ee4517438d23838925853519 Mon Sep 17 00:00:00 2001 From: Tim Waizenegger Date: Thu, 15 May 2025 16:28:58 +0200 Subject: [PATCH 25/28] note on index type --- .../ai-accelerator/knowledge_base/performance_tuning.mdx | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx index 3d338d14f8e..e3c377cf8b8 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx @@ -16,6 +16,9 @@ AIDB offers batch processing. All the batches get processed sequentially, one af - [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have - Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution. +!!! Note +vector indexing also has an impact on pipeline performance. You can disable the vector by using `index_type => 'disabled'` to exclude it from your measurements. +!!! ## Testing and tuning performance We will first set up test data and a knowledge base pipeline, then measure and tune the batch size. @@ -33,11 +36,12 @@ INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hel The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use. ```sql SELECT aidb.create_table_knowledge_base( - name => 'perf_test', - model_name => 'my_model', -- use the model you want to optimize for + name => 'perf_test_b', + model_name => 'dummy', -- use the model you want to optimize for source_table => 'test_data_10k', source_data_column => 'msg', source_data_format => 'Text', + index_type => 'disabled', -- optionally disable vector indexing to include/exclude it from the measurement auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime batch_size => 100 -- this is the paramter we will tune during this test ); From 49150a7da22156193ce53464ae6d6bd50c0bf2b3 Mon Sep 17 00:00:00 2001 From: Tim Waizenegger Date: Thu, 15 May 2025 16:30:50 +0200 Subject: [PATCH 26/28] note on index type --- .../ai-accelerator/knowledge_base/performance_tuning.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx index e3c377cf8b8..bf6e8b3869d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx @@ -24,7 +24,7 @@ vector indexing also has an impact on pipeline performance. You can disable the We will first set up test data and a knowledge base pipeline, then measure and tune the batch size. ### 1) Create a table and insert test data -The actual data content does not matter for this test, so we can generate data: +The actual data content length has some impact on model performance. You can use longer text to test that. ```sql CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL); From 0b3a128e02760943368e5026acfeaa92cb9ccf86 Mon Sep 17 00:00:00 2001 From: Tim Waizenegger Date: Mon, 19 May 2025 10:23:45 +0200 Subject: [PATCH 27/28] document PGFS non-https usage --- .../ai-accelerator/pgfs/functions/s3.mdx | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx index a97bd57cc69..ba30235113a 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx @@ -25,6 +25,7 @@ select pgfs.create_storage_location( | `skip_signature` | Disable HMAC authentication (set this to "true" when you're not providing access_key_id/secret_access_key in the credentials). | | `region` | The region of the S3-compatible storage system. If the region is not specified, the client will attempt auto-discovery. | | `endpoint` | The endpoint of the S3-compatible storage system. | +| `allow_http` | Whether the endpoint uses plain HTTP (rather than HTTPS/TLS). Set this to `true` if your endpoint starts with `http://`. | ### The `credentials` argument in JSON format offers the following settings: | Option | Description | @@ -53,7 +54,7 @@ SELECT pgfs.create_storage_location('internal_ai_project', 's3://my-company-ai-i ); ``` -## Example: non-AWS S3 / S3-compatible +## Example: non-AWS S3 / S3-compatible with HTTPS This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3. ```sql @@ -63,4 +64,16 @@ SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images' ); ``` +## Example: non-AWS S3 / S3-compatible with HTTP +This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3. + +In this case, the server does not use TLS encryption; so we configure a plain HTTP connection. + +```sql +SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images', + options => '{"endpoint": "http://minio-api.apps.local", "allow_http":"true"}', + credentials => '{"access_key_id": "my_username", "secret_access_key":"my_password"}' + ); +``` + From 19ac85e279649466205aa71ca179cfdfd81d01c6 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 19 May 2025 08:27:15 +0000 Subject: [PATCH 28/28] update generated release notes --- .../rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx index 86e955b1e5a..51174a0fb6e 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -7,16 +7,24 @@ editTarget: originalFilePath Released: 19 May 2025 -This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline. +This is a minor release that includes enhancements to the preparer pipeline and the model API providers. ## Highlights - Automatic unnesting of Preparer results for operations that transform the shape of data. +- Batch processing for embeddings with external models. ## Enhancements - + +
DescriptionAddresses
Placeholder for future release note.

Soon.

+
Automatic unnesting of Preparer results for operations that transform the shape of data.

The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. +This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. +Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.

+
Batch processing for embeddings with external models.

The external model providers embeddings, openai_embeddings, and nim_embeddings can now send a batch of inputs in a single request, rather than multiple concurrent requests. +This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled.

+
Change output column for chunk_text() primitive function.

The enumeration column returned by the chunk_text() primitive function is now part_id instead of chunk_id to match the other Preparer primitives/operations.