diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx index 920ccae199d..947dead787b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx @@ -123,7 +123,11 @@ As well as for existing pipelines: - With [`aidb.set_auto_knowledge_base`](../reference/knowledge_bases#aidbset_auto_knowledge_base) ## Batch processing -In Background and Disabled modes, (auto) processing happens in batches of configurable size. Within each batch, +In Background and Disabled modes, (auto) processing happens in batches of configurable size. The pipeline will process all source records in batches. +All records within each batch are processed in parallel wherever possible. This means pipeline steps like data retrieval, embeddings computation, and storing embeddings will run as parallel operations. +E.g., when using a table as a data source, a batch of input records will be retrieved with a single query. With a volume source, concurrent requests will be used to retrieve a batch of records. + +Our [knowledge base pipeline performance tuning guide](../knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput. ## Change detection AIDB auto-processing is designed around change detection mechanisms for table and volume data sources. This allows it to only diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx new file mode 100644 index 00000000000..bf6e8b3869d --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx @@ -0,0 +1,121 @@ +--- +title: "Pipelines knowledge base performance tuning" +navTitle: "Performance tuning" +deepToC: true +description: "How to tune the performance of knowledge base pipelines." +--- + + +## Background +The performance (i.e., throughput of embeddings per second) can be optimized by changing pipeline and model settings. +This guide explains the relevant settings and shows how to tune them. + +Knowledge base piplines process collections of individual records (rows in a table or objects in a volume). Rather than processing each record individually and sequentially, or processing all of them concurrently, +AIDB offers batch processing. All the batches get processed sequentially, one after the other. Within each batch, records get processed concurrently wherever possible. + +- [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have +- Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution. + +!!! Note +vector indexing also has an impact on pipeline performance. You can disable the vector by using `index_type => 'disabled'` to exclude it from your measurements. +!!! + +## Testing and tuning performance +We will first set up test data and a knowledge base pipeline, then measure and tune the batch size. + +### 1) Create a table and insert test data +The actual data content length has some impact on model performance. You can use longer text to test that. +```sql +CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL); + +INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hello world'; +``` + + +### 2) Create a knowledge base pipeline +The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use. +```sql +SELECT aidb.create_table_knowledge_base( + name => 'perf_test_b', + model_name => 'dummy', -- use the model you want to optimize for + source_table => 'test_data_10k', + source_data_column => 'msg', + source_data_format => 'Text', + index_type => 'disabled', -- optionally disable vector indexing to include/exclude it from the measurement + auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime + batch_size => 100 -- this is the paramter we will tune during this test +); +__OUTPUT__ +INFO: using vector table: public.perf_test_vector +NOTICE: index "vdx_perf_test_vector" does not exist, skipping +NOTICE: auto-processing is set to "Disabled". Manually run "SELECT aidb.bulk_embedding('perf_test');" to compute embeddings. + create_table_knowledge_base +----------------------------- + perf_test +(1 row) +``` + +### 3) Run the pipeline, measure the performance +We use `psql` in this test; the `\timing on` command is a feature in psql. If you use a different interface, check how it can display timing information. + +```sql +\timing on +__OUTPUT__ +Timing is on. +``` + +Now run the pipeline: +```sql +SELECT aidb.bulk_embedding('perf_test'); +__OUTPUT__ +INFO: perf_test: (re)setting state table to process all data... +INFO: perf_test: Starting... Batch size 100, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 0 +INFO: perf_test: Batch iteration finished, unprocessed rows: 9900, count(source records): 10000, count(embeddings): 100 +INFO: perf_test: Batch iteration finished, unprocessed rows: 9800, count(source records): 10000, count(embeddings): 200 +... +INFO: perf_test: Batch iteration finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 +INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 + bulk_embedding +---------------- + +(1 row) + +Time: 207161,174 ms (03:27,161) +``` + + + +### 4) Tune the batch size +You can use this call to adjust the batch size of the pipeline. We increase by 10x to 1000 records: +```sql +SELECT aidb.set_auto_knowledge_base('perf_test', 'Disabled', batch_size=>1000); +``` + +Run the pipeline again. + +!!! Note +When using a Postgres table as the source, with auto-processing disabled, AIDB has no means to detect changes in the source data. So each bulk_embedding call has to re-process everything. + +This is convenient for performance testing. + +If you want to measure performance with a volumes source, you should delete and re-create the knowledge base between each test. AIDB is able to detect changes on volumes even with auto-procesing disabled. + +!!! +```sql +SELECT aidb.bulk_embedding('perf_test'); +__OUTPUT__ +INFO: perf_test: (re)setting state table to process all data... +INFO: perf_test: Starting... Batch size 1000, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 10000 +... +INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000 + bulk_embedding +---------------- + +(1 row) + +Time: 154276,486 ms (02:34,276) +``` + + +## Conclusion +In this test, the pipeline took 02:34 min with batch size 1000 and 03:27 min with size 100. You can continue testing larger sizes until performance no longer improves, or even declines. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx index 9e9e14cf5a4..1bdd2a99a32 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx @@ -42,7 +42,7 @@ Based on the name of the model, the model provider sets defaults accordingly: ## Creating the default with OpenAI model ```sql -SELECT aidb.create_model('my_openai_embeddings', +SELECT aidb.create_model('my_openai_embeddings', 'openai_embeddings', credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB); ``` @@ -58,7 +58,7 @@ SELECT aidb.create_model( 'my_openai_model', 'openai_embeddings', '{"model": "text-embedding-3-small"}'::JSONB, - '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB + '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB ); ``` @@ -69,12 +69,35 @@ Because this example is passing the configuration options and the credentials, u The following configuration settings are available for OpenAI models: * `model` — The OpenAI model to use. -* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL. - * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`. +* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL. + * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`. * If `nim_completions` is the `model`, `url` defaults to `https://integrate.api.nvidia.com/v1/chat/completions`. * `max_concurrent_requests` — The maximum number of concurrent requests to make to the OpenAI model. The default is `25`. - -## Model credentials +* `max_batch_size` — The maximum number of records to send to the model in a single request. The default is `50.000`. + +### Batch and parallel processing +The model providers for `embeddings`, `openai_embeddings`, and `nim_embeddings` support sending batch requests as well as concurrent requests. +The two settings `max_concurrent_requests` and `max_batch_size` control this behavior. When a model provider receives a set of records (E.g., from a knowledge base pipeline) + the following happens: +* Assuming the knowledge base pipeline is configured with batch size 10.000. +* And the model provider is configured with `max_batch_size=1000` and `max_concurrent_requests=5`. +* Then, the provider will collect up to 1000 records and send them in a single request to the model. +* And it will send 5 such large requests concurrently, until no more input records are left. +* So in this example, the provider needs to send/receive 10 batches in total. + * After sending the first 5, it waits for the responses to return. + * Once a response is received, another request can be sent. + * This means the provider won't wait for all 5 to return before sending off the next 5. Instead, it always keeps up to 5 requests in flight. + +!!! Note +The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on +the hardware and infrastructure. + +We recommend leaving the defaults in place and [tuning the performance via the knowledge base pipeline batch size.](../../knowledge_base/performance_tuning) +The default `max_batch_size` of 50.000 is intentionally high to allow the pipeline to control the actual size of the batches. +!!! + + +### Model credentials The following credentials may be required by the service providing these models. Note: `api_key` and `basic_auth` are exclusive. Only one of these two options can be used. * `api_key` — The API key to use for Bearer Token authentication. The api_key will be sent in a header field as `Authorization: Bearer `. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx new file mode 100644 index 00000000000..3e5ba17f165 --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx @@ -0,0 +1,45 @@ +--- +title: "Pipelines PGFS with Google Cloud Storage" +navTitle: "Google Cloud storage" +description: "PGFS options and credentials with Google Cloud Storage." +--- + + +## Overview: Google Cloud Storage +PGFS uses the `gs:` prefix to indicate an Google Cloud Storage bucket. + +The general syntax for using GCS is this: +```sql +select pgfs.create_storage_location( + 'storage_location_name', + 'gs://bucket_name', + credentials => '{}'::JSONB + ); +``` + +### The `credentials` argument in JSON format offers the following settings: +| Option | Description | +|------------------------------------|------------------------------------------| +| `google_application_credentials` | Path to the application credentials file | +| `google_service_account_key_file` | Path to the service account key file | + +See the [Google Cloud documentation](https://cloud.google.com/iam/docs/keys-create-delete#creating) for more information on how to manage service account keys. + +These options can also be set up via the equivalent environment variables to facilitate authentication in managed environments such as Google Kubernetes Engine. + +## Example: private GCS bucket + +```sql +SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images', + credentials => '{"google_service_account_key_file": "/var/run/gcs.json"}' + ); +``` + +## Example: authentication in GKE + +Ensure that the `GOOGLE_APPLICATION_CREDENTIALS` or the `GOOGLE_SERVICE_ACCOUNT_KEY_FILE` environment variable +is set on your PostgreSQL pod. Then, PGFS will automatically pick them up: + +```sql +SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images'); +``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx index a97bd57cc69..ba30235113a 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx @@ -25,6 +25,7 @@ select pgfs.create_storage_location( | `skip_signature` | Disable HMAC authentication (set this to "true" when you're not providing access_key_id/secret_access_key in the credentials). | | `region` | The region of the S3-compatible storage system. If the region is not specified, the client will attempt auto-discovery. | | `endpoint` | The endpoint of the S3-compatible storage system. | +| `allow_http` | Whether the endpoint uses plain HTTP (rather than HTTPS/TLS). Set this to `true` if your endpoint starts with `http://`. | ### The `credentials` argument in JSON format offers the following settings: | Option | Description | @@ -53,7 +54,7 @@ SELECT pgfs.create_storage_location('internal_ai_project', 's3://my-company-ai-i ); ``` -## Example: non-AWS S3 / S3-compatible +## Example: non-AWS S3 / S3-compatible with HTTPS This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3. ```sql @@ -63,4 +64,16 @@ SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images' ); ``` +## Example: non-AWS S3 / S3-compatible with HTTP +This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3. + +In this case, the server does not use TLS encryption; so we configure a plain HTTP connection. + +```sql +SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images', + options => '{"endpoint": "http://minio-api.apps.local", "allow_http":"true"}', + credentials => '{"access_key_id": "my_username", "secret_access_key":"my_password"}' + ); +``` + diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx index f5667266fdc..6e38969954b 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx @@ -34,6 +34,12 @@ Bulk data preparation performs a preparer's associated operation for all of the Bulk data preparation does not delete existing destination data unless it conflicts with newly generated data. It is recommended to configure separate destination tables for each preparer. !!! +## Unnesting + +Some Preparer [Primitives](./primitives) transform the shape of the data they are given. For example, `ChunkText` receives one text block and produces one or more text blocks. Rather than return nested collections of results, these Primitives automatically unnest (or "explode") their output, using a new `part_id` column to track the additional dimension. + +You can see this in action in [Primitives](./primitives) and in the applicable [examples](./examples). + ## Consistency with source data To ensure correct and consistent data, the prepared destination data must be in sync with the source data. In the case of the table data source, you can enable preparer auto processing to inform the preparer pipeline about changes to the source data. diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx new file mode 100644 index 00000000000..dab273a4fc8 --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx @@ -0,0 +1,121 @@ +--- +title: Preparer Chaining Example +navTitle: Preparer Chaining +description: Examples of using the preparer auto processing in AI Accelerator. +--- + +Example of chaining multiple preparers together with auto processing using the [ChunkText](../primitives#chunk-text) and [SummarizeText](../primitives#summarize-text) operations in AI Accelerator. + +## Create the first Preparer to chunk text + +```sql +-- Create source test table +CREATE TABLE source_table__1321 +( + id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, + content TEXT NOT NULL +); + +SELECT aidb.create_table_preparer( + name => 'chunking_preparer__1321', + operation => 'ChunkText', + source_table => 'source_table__1321', + source_key_column => 'id', + source_data_column => 'content', + destination_table => 'chunked_data__1321', + destination_data_column => 'chunk', + destination_key_column => 'id', + options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation +); +``` + +## Create the second Preparer to summarize the chunked text + +```sql +-- Create the model. It must support the decode_text and decode_text_batch operations. +SELECT aidb.create_model('model__1321', 't5_local'); + +SELECT aidb.create_table_preparer( + name => 'summarizing_preparer__1321', + operation => 'SummarizeText', + source_table => 'chunked_data__1321', -- Reference the output from the ChunkText preparer + source_key_column => 'unique_id', -- Reference the unique column from the output of the ChunkText preparer + source_data_column => 'chunk', -- Reference the output from the ChunkText preparer + destination_table => 'summarized_data__1321', + destination_data_column => 'summary', + destination_key_column => 'chunk_unique_id', + options => '{"model": "model__1321"}'::JSONB -- Configuration for the SummarizeText operation +); +``` + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + +## Set both Preparers to Live automatic processing + +```sql +SELECT aidb.set_auto_preparer('chunking_preparer__1321', 'Live'); +SELECT aidb.set_auto_preparer('summarizing_preparer__1321', 'Live'); +``` + +## Insert data for processing + +Now, when we insert data into the source data table, we see processed results flowing automatically... + +```sql +INSERT INTO source_table__1321 +VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'); +``` + +Chunks calculated automatically: + +```sql +SELECT * FROM chunked_data__1321; + +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. +(3 rows) +``` + +Summaries of the chunks calculated automatically: + +```sql +SELECT * FROM summarized_data__1321; + +__OUTPUT__ + chunk_unique_id | summary +-----------------+------------------------------------------------------------------------------------------------------ + 1.part.0 | text example might require splitting into smaller chunks . + 1.part.1 | the purpose of this function is to partition text data into segments of a specified maximum length . + 1.part.2 | enables processing or storage of data in manageable parts . +(3 rows) +``` + +The same automatic flow of logic occurs for deletions: + +```sql +DELETE FROM source_table__1321 WHERE id = 1; +``` + +```sql +SELECT * FROM chunked_data__1321; + +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+------- +(0 rows) +``` + +```sql +SELECT * FROM summarized_data__1321; + +__OUTPUT__ + chunk_unique_id | summary +-----------------+--------- +(0 rows) +``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx index aa4340663cb..0c617d70705 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx @@ -3,7 +3,11 @@ title: Preparers chunk text operation examples navTitle: Chunk text description: Examples of using preparers with the ChunkText operation in AI Accelerator. --- -These dxamples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. +These examples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! ## Primitive @@ -11,20 +15,61 @@ These dxamples use preparers with the [ChunkText operation](../primitives#chunk- -- Only specify a desired length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10}'); +__OUTPUT__ + part_id | chunk +---------+----------- + 0 | This is a + 1 | simple + 2 | test + 3 | sentence. +(4 rows) +``` + +```sql -- Specify a desired length and a maximum length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10, "max_length": 15}'); +__OUTPUT__ + part_id | chunk +---------+------------- + 0 | This is a + 1 | simple test + 2 | sentence. +(3 rows) +``` + +```sql -- Named parameters -SELECT - chunk_id, - chunk -FROM aidb.chunk_text( +SELECT * FROM aidb.chunk_text( input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.', - options => '{"desired_length": 10}' + options => '{"desired_length": 40}' ); +__OUTPUT__ + part_id | chunk +---------+---------------------------------------- + 0 | This is a significantly longer text + 1 | example that might require splitting + 2 | into smaller chunks. + 3 | The purpose of this function is to + 4 | partition text data into segments of a + 5 | specified maximum length, for example, + 6 | this sentence 145 is characters. + 7 | This enables processing or storage of + 8 | data in manageable parts. +(9 rows) +``` + +```sql -- Semantic chunking to split into the largest continuous semantic chunk that fits in the max_length SELECT * FROM aidb.chunk_text('This sentence should be its own chunk. This too.', '{"desired_length": 1, "max_length": 1000}'); + +__OUTPUT__ + part_id | chunk +---------+---------------------------------------- + 0 | This sentence should be its own chunk. + 1 | This too. +(2 rows) ``` ## Preparer with table data source @@ -56,12 +101,13 @@ SELECT aidb.bulk_data_preparation('preparer__1628'); SELECT * FROM chunked_data__1628; --- Unnest chunk text arrays -SELECT - id, - chunk_number, - chunk -FROM - chunked_data__1628, - unnest(chunks) WITH ORDINALITY AS chunk_list(chunk, chunk_number); +__OUTPUT__ + id | part_id | unique_id | chunks +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(5 rows) ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx index 629bd2f3daf..c64bb9ef63e 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx @@ -4,7 +4,11 @@ navTitle: Auto Processing description: Examples of using the preparer auto processing in AI Accelerator. --- -Examples of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. +Example of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator. + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! ## Preparer with table data source @@ -22,7 +26,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1628', source_data_column => 'content', destination_table => 'chunked_data__1628', - destination_data_column => 'chunks', + destination_data_column => 'chunk', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation @@ -32,14 +36,54 @@ SELECT aidb.set_auto_preparer('preparer__1628', 'Live'); INSERT INTO source_table__1628 VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'); +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. +(3 rows) +``` + +```sql INSERT INTO source_table__1628 VALUES (2, 'This sentence should be its own chunk. This too.'); +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(5 rows) +``` + +```sql DELETE FROM source_table__1628 WHERE id = 1; +``` + +```sql SELECT * FROM chunked_data__1628; +__OUTPUT__ + id | part_id | unique_id | chunk +----+---------+-----------+---------------------------------------- + 2 | 0 | 2.part.0 | This sentence should be its own chunk. + 2 | 1 | 2.part.1 | This too. +(2 rows) +``` + +```sql SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled'); ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx index bf611ad5515..19e82bd8e6d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx @@ -14,6 +14,17 @@ SELECT * FROM aidb.parse_html( '

Hello World Heading

Hello World paragraph

' ); +__OUTPUT__ + parse_html +----------------------- + Hello World Heading + + + + Hello World paragraph+ + +(1 row) +``` + +```sql -- Parse Hello World HTML to plaintext SELECT * FROM aidb.parse_html( html => @@ -33,6 +44,24 @@ SELECT * FROM aidb.parse_html( options => '{"method": "StructuredPlaintext"}' -- Default ); +__OUTPUT__ + parse_html +----------------------------------------------------------- + Hello, world! + + + + This is my first web page. + + + + It contains some bold text, some italic test, and a link.+ + + + Postgres Logo Image + + List item + + List item + + List item + + +(1 row) +``` + +```sql -- Parse Hello World HTML to markdown-esque text that retains some syntactical context SELECT * FROM aidb.parse_html( html => @@ -51,6 +80,22 @@ SELECT * FROM aidb.parse_html( ', options => '{"method": "StructuredMarkdown"}' ); + +__OUTPUT__ + parse_html +--------------------------------------------------------------------------------------- + # Hello, world! + + + + This is my first web page. + + + + It contains some **bold text**, some *italic test*, and a [link](https://google.com).+ + + + ![Postgres Logo Image](postgres_logo.png) + + 1. List item + + 2. List item + + 3. List item + + +(1 row) ``` ## Preparer with table data source @@ -81,4 +126,15 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__2772'); SELECT * FROM destination_table__2772; + +__OUTPUT__ + id | parsed_html +----+------------------------------------------------------- + 1 | Hello World Heading + + | + + | Hello World paragraph + + | + 2 | This is some bold text, some italic test, and a link.+ + | +(2 rows) ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx index 8f2f3f99328..55b3c5bf616 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx @@ -6,6 +6,10 @@ description: Examples of using preparers with the ParsePdf operation in AI Accel These examples use preparers with the [ParsePdf operation](../primitives#parse-pdf) in AI Accelerator. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Primitive ```sql @@ -15,11 +19,27 @@ SELECT * FROM aidb.parse_pdf( decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex') ); +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) +``` + +```sql -- Manually specify the default options SELECT * FROM aidb.parse_pdf( bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'), options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default ); + +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) ``` ## Preparer with table data source @@ -51,12 +71,12 @@ SELECT aidb.bulk_data_preparation('preparer__6124'); SELECT * FROM destination_table__6124; --- Unnest chunk text arrays -SELECT - id, - page_number, - parsed_text -FROM - destination_table__6124, - unnest(parsed_pdf) WITH ORDINALITY AS pdf_pages(parsed_text, page_number); +__OUTPUT__ + id | part_id | unique_id | parsed_pdf +----+---------+-----------+-------------- + 1 | 0 | 1.part.0 | Hello World!+ + | | | + 2 | 0 | 2.part.0 | Hello World!+ + | | | +(2 rows) ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx index 0985759af7c..befcbb7e777 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx @@ -6,6 +6,10 @@ description: Examples of using preparers with the PerformOcr operation in AI Acc Examples of using preparers with the [PerformOcr operation](../primitives#summarize-text) in AI Accelerator. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail. +!!! + ## Model creation (required) This step is required for primitive single execution and for preparer bulk execution. @@ -28,11 +32,25 @@ SELECT * FROM aidb.perform_ocr( options => '{"model": "my_paddle_ocr_model"}' ); +__OUTPUT__ + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) +``` + +```sql -- Positional arguments SELECT * FROM aidb.perform_ocr( decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'), '{"model": "my_paddle_ocr_model"}' ); + +__OUTPUT__ + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) ``` ## Preparer with table data source @@ -62,6 +80,38 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__1527'); SELECT * FROM ocr_data__1527; + +__OUTPUT__ + id | part_id | unique_id | parsed__text +----+---------+-----------+-------------------------------------------- + 1 | 0 | 1.part.0 | Trunch Parish Council + 1 | 1 | 1.part.1 | BANK RECONCILIATION AS AT 31STOCTOBER 2019 + 1 | 2 | 1.part.2 | Account: + 1 | 3 | 1.part.3 | 14,389.43 + 1 | 4 | 1.part.4 | BANK STATEMENT BALANCE 3OTH SEPTEMBER 2019 + 1 | 5 | 1.part.5 | 83.60 + 1 | 6 | 1.part.6 | PREVIOUS OUTSTANDING CHEQUES + 1 | 7 | 1.part.7 | 14,305.83 + 1 | 8 | 1.part.8 | CASHBOOK BALANCE 31ST OCTOBER 2019 + 1 | 9 | 1.part.9 | ADD CHEQUES OUTSTANDING: + 1 | 10 | 1.part.10 | * + 1 | 11 | 1.part.11 | 101719 + 1 | 12 | 1.part.12 | 83.60* + 1 | 13 | 1.part.13 | * + 1 | 14 | 1.part.14 | * + 1 | 15 | 1.part.15 | 83.60 + 1 | 16 | 1.part.16 | OUTSTANDING CHEQUES + 1 | 17 | 1.part.17 | 9,148.00 + 1 | 18 | 1.part.18 | RECEIPTS + 1 | 19 | 1.part.19 | 4,309.94 + 1 | 20 | 1.part.20 | PAYMENTS + 1 | 21 | 1.part.21 | 19,227.49 + 1 | 22 | 1.part.22 | BALANCE 31STOCTOBER2019 + 1 | 23 | 1.part.23 | 19,227.49* + 1 | 24 | 1.part.24 | BALANCE AS PER BANK STATEMENT + 1 | 25 | 1.part.25 | 0.00 + 1 | 26 | 1.part.26 | DIFFERENCE +(27 rows) ``` ## Model compatibility diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx index 55e662801b3..c8039ebbe45 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx @@ -24,11 +24,25 @@ SELECT * FROM aidb.summarize_text( options => '{"model": "model__1952"}' ); +__OUTPUT__ + summarize_text +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise . +(1 row) +``` + +```sql -- Positional arguments SELECT * FROM aidb.summarize_text( 'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.', '{"model": "model__1952"}' ); + +__OUTPUT__ + summarize_text +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(1 row) ``` ## Preparer with table data source @@ -50,7 +64,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1952', source_data_column => 'content', destination_table => 'summarized_data__1952', - destination_data_column => 'summaries', + destination_data_column => 'summary', source_key_column => 'id', destination_key_column => 'id', options => '{"model": "model__1952"}'::JSONB -- Configuration for the SummarizeText operation @@ -59,6 +73,13 @@ SELECT aidb.create_table_preparer( SELECT aidb.bulk_data_preparation('preparer__1952'); SELECT * FROM summarized_data__1952; + +__OUTPUT__ + id | summary +----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 1 | the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise . + 2 | the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(2 rows) ``` ## Model compatibility @@ -88,7 +109,7 @@ SELECT aidb.create_table_preparer( source_table => 'source_table__1952', source_data_column => 'content', destination_table => 'summarized_data__1952', - destination_data_column => 'summaries', + destination_data_column => 'summary', options => '{"model": "bert_model"}'::JSONB -- Incompatible model ); __OUTPUT__ diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx index 2d0e2d556b5..62771150bbf 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx @@ -17,20 +17,25 @@ All data preparation operations can be customized with different options. The AP Call `aidb.chunk_text()` to break text into smaller chunks. ```sql -SELECT - chunk_id, - chunk -FROM aidb.chunk_text( +SELECT * FROM aidb.chunk_text( input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.', options => '{"desired_length": 120, "max_length": 150}' ); + +__OUTPUT__ + part_id | chunk +---------+--------------------------------------------------------------------------------------------------------------------------------------------------- + 0 | This is a significantly longer text example that might require splitting into smaller chunks. + 1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. + 2 | This enables processing or storage of data in manageable parts. +(3 rows) ``` - The `desired_length` size is the target size for the chunk. In most cases, this value also serves as the maximum size of the chunk. It's possible for a chunk to be returned that's less than the `desired` value, as adding the next piece of text may have made it larger than the `desired` capacity. - The `max_length` size is the maximum possible chunk size that can be generated. Setting this to a value larger than `desired` means that the chunk should be as close to `desired` as possible but can be larger if it means staying at a larger semantic level. -!!! Note -This primitive function returns each chunk with a `chunk_id` for ease of development. However, a preparer with the `ChunkText` operation outputs a single text array per input that can then be unnested as desired. +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. !!! ## Parse HTML @@ -55,6 +60,22 @@ SELECT * FROM aidb.parse_html( ', options => '{"method": "StructuredPlaintext"}' -- Default ); + +__OUTPUT__ + parse_html +----------------------------------------------------------- + Hello, world! + + + + This is my first web page. + + + + It contains some bold text, some italic test, and a link.+ + + + Postgres Logo Image + + List item + + List item + + List item + + +(1 row) ``` - The `method` determines how the HTML is parsed: @@ -70,12 +91,25 @@ SELECT * FROM aidb.parse_pdf( bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'), options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default ); + +__OUTPUT__ + part_id | text +---------+-------------- + 0 | Hello World!+ + | +(1 row) ``` - The `method` determines how the PDF is parsed: - `Structured` (Default) — Algorithmic text extraction. - The `allow_partial_parsing` flag determines whether to continue to parse PDFs when the parser encounters errors on one or more pages. Defaults to `true`. +- The `part_id` column in the output references the index of the page from which the text was extracted. + +!!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. +!!! + ## Summarize text Call `aidb.summarize_text()` to summarize text: @@ -88,6 +122,17 @@ SELECT * FROM aidb.summarize_text( input => 'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.', options => '{"model": "my_t5_model"}' ); + +__OUTPUT__ + create_model +-------------- + my_t5_model +(1 row) + + summarize_text +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim . +(1 row) ``` - The `model` is the name of the created model to use for summarization. The model must support the `decode_text()` and `decode_text_batch()` [model primitives](../models/primitives). @@ -108,11 +153,26 @@ SELECT * FROM aidb.perform_ocr( decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'), options => '{"model": "my_paddle_ocr_model"}' ); + +__OUTPUT__ + create_model +-------------- +my_paddle_ocr_model +(1 row) + + part_id | text +---------+------------------ + 0 | Tesseract sample +(1 row) ``` - The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation. !!! Tip +This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail. +!!! + +!!! Note Limitations of the model still apply. For example, the [NVIDIA NIM Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/api-reference.html) model provider only supports `png` and `jpeg` image inputs. !!! diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx index c148f6c5463..264a762892c 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx @@ -10,7 +10,7 @@ description: "Usage of preparers in AI Accelerator Pipelines." The source data preparer can come from a Postgres table or a PGFS volume. Given the different nature of the data sources and the options required for each, you use different functions to create them. !!! Note -You can customze te behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives). +You can customize the behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives). !!! ## Preparer for a table data source @@ -32,6 +32,10 @@ aidb.create_table_preparer( ) ``` +!!! Tip +The `source_key_column` must be a unique key for the source data. If the data source is the output of a Preparer that [transforms the data shape](./concepts#unnesting) with a `part_id` column, make sure to use the new `unique_id` column. +!!! + ### Example: Creating a preparer ``` sql @@ -41,7 +45,7 @@ SELECT aidb.create_table_preparer( source_table => 'test_source_table', source_data_column => 'content', destination_table => 'chunked_data_destination_table', - destination_data_column => 'chunks', + destination_data_column => 'chunk', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation @@ -73,7 +77,7 @@ SELECT aidb.create_volume_preparer( operation => 'ChunkText', source_volume_name => 'test_volume', destination_table => 'chunked_data_destination_table', - destination_data_column => 'chunks', + destination_data_column => 'chunk', destination_key_column => 'id', options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation ); @@ -108,7 +112,7 @@ SELECT * FROM aidb.preparers; __OUTPUT__ id | name | operation | destination_schema | destination_table | destination_key_column | destination_data_column | options | source_type | source_schema | source_table | source_data_column | source_key_column | source_volume_name ----+---------------+-----------+--------------------+--------------------------------+------------------------+-------------------------+-------------------------+-------------+---------------+-------------------+--------------------+-------------------+-------------------- - 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunks | {"desired_length": 100} | Table | public | test_source_table | content | id | + 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunk | {"desired_length": 100} | Table | public | test_source_table | content | id | (1 row) ``` diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx index 1588f8e5126..ae5dc00975d 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx @@ -135,7 +135,7 @@ Creates a knowledge base for a given table. | source_table | regclass | Required | Name of the table to use as source. | | source_data_column | TEXT | Required | Column name in source table to use. | | source_data_format | [aidb.PipelineDataFormat](#aidbpipelinedataformat) | Required | Format of data in that column ("Text", "Image", "PDF"). | -| source_key_column | TEXT | 'id' | Column to use as key to reference the rows. | +| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. | | vector_table | TEXT | NULL | | | vector_data_column | TEXT | 'embeddings' | | | vector_key_column | TEXT | 'id' | | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx index 296560f64fa..48366f40eed 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx @@ -5,7 +5,7 @@ description: "Reference documentation for EDB Postgres AI - AI Accelerator Pipel deepToC: true --- -This reference documentation for EDB Postgres AI - AI Accelerator Pipelines PGFS includes information on the functions and views available in the [pgfs](../pgfs) extension. These functions give aidb access to S3-compatible file systems and local file systems. +This reference documentation for EDB Postgres AI - AI Accelerator Pipelines PGFS includes information on the functions and views available in the [pgfs](../pgfs) extension. These functions give aidb access to S3-compatible file systems, Google Cloud Storage buckets and local file systems. ## pgfs @@ -44,13 +44,13 @@ Creates a storage location in the database. #### Parameters -| Parameter | Type | Default | Description | -|---------------|-------|---------|---------------------------------------------------------| -| `name` | text | | Name for storage location | -| `url` | text | | URL for this storage location (prefix `s3:` or `file:`) | -| `msl_id` | uuid | | Unused | -| `options` | jsonb | | Options for the storage location | -| `credentials` | jsonb | | Credentials for the storage location | +| Parameter | Type | Default | Description | +|---------------|-------|---------|-----------------------------------------------------------------| +| `name` | text | | Name for storage location | +| `url` | text | | URL for this storage location (prefix `s3:`, `gs:`, or `file:`) | +| `msl_id` | uuid | | Unused | +| `options` | jsonb | | Options for the storage location | +| `credentials` | jsonb | | Credentials for the storage location | #### Example @@ -64,13 +64,13 @@ Creates a storage location in the database and associates it with a foreign tabl #### Parameters -| Parameter | Type | Default | Description | -|-------------------------|------|---------|---------------------------------------------------------| -| `storage_location_name` | text | | Name for storage location | -| `url` | text | | URL for this storage location (prefix `s3:` or `file:`) | -| `msl_id` | uuid | | Unused | -| `options` | json | | Options for the storage location | -| `credentials` | json | | Credentials for the storage location | +| Parameter | Type | Default | Description | +|-------------------------|------|---------|----------------------------------------------------------------| +| `storage_location_name` | text | | Name for storage location | +| `url` | text | | URL for this storage location (prefix `s3:`, `gs:` or `file:`) | +| `msl_id` | uuid | | Unused | +| `options` | json | | Options for the storage location | +| `credentials` | json | | Credentials for the storage location | #### Example diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx index 84144aad956..4835af8c5f6 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx @@ -58,7 +58,7 @@ Creates a preparer with a source data table. | source_data_column | TEXT | Required | Column in the source table containing the raw data | | destination_table | TEXT | Required | Name of the destination table | | destination_data_column | TEXT | Required | Column in the destination table for processed data | -| source_key_column | TEXT | 'id' | Column to use as key to reference the rows | +| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. | | destination_key_column | TEXT | 'id' | Key column in the destination table that references the `source_key_column` | | options | JSONB | '{}'::JSONB | Configuration options for the data preparation operation. Uses the same API as the [data preparation primitives](../preparers/primitives.mdx). | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx new file mode 100644 index 00000000000..3590058bdbe --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx @@ -0,0 +1,31 @@ +--- +title: AI Accelerator - Pipelines 4.1.0 release notes +navTitle: Version 4.1.0 +originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml +editTarget: originalFilePath +--- + +Released: 19 May 2025 + +This is a minor release that includes enhancements to the preparer pipeline and the model API providers. + +## Highlights + +- Automatic unnesting of Preparer results for operations that transform the shape of data. +- Batch processing for embeddings with external models. + +## Enhancements + + + + + +
DescriptionAddresses
Automatic unnesting of Preparer results for operations that transform the shape of data.

The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. +This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. +Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to uniquely identify the combination of the source key and part_id.

+
Batch processing for embeddings with external models.

The external model providers embeddings, openai_embeddings, and nim_embeddings can now send a batch of inputs in a single request, rather than multiple concurrent requests. +This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled.

+
Change output column for chunk_text() primitive function.

The enumeration column returned by the chunk_text() primitive function is now part_id instead of chunk_id to match the other Preparer primitives/operations.

+
+ + diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx index dbc87bf6dfd..a46870bd873 100644 --- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx @@ -4,6 +4,7 @@ navTitle: Release notes description: Release notes for EDB Postgres AI - AI Accelerator indexCards: none navigation: + - ai-accelerator_4.1.0_rel_notes - ai-accelerator_4.0.1_rel_notes - ai-accelerator_4.0.0_rel_notes - ai-accelerator_3.0.1_rel_notes @@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera | AI Accelerator version | Release Date | |---|---| +| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 | | [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 | | [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 | | [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 | diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml new file mode 100644 index 00000000000..edbcf329ba0 --- /dev/null +++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml @@ -0,0 +1,36 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json +product: AI Accelerator - Pipelines +version: 4.1.0 +date: 19 May 2025 +intro: | + This is a minor release that includes enhancements to the preparer pipeline and the model API providers. +highlights: | + - Automatic unnesting of Preparer results for operations that transform the shape of data. + - Batch processing for embeddings with external models. +relnotes: +- relnote: Automatic unnesting of Preparer results for operations that transform the shape of data. + details: | + The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections. + This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases. + Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to uniquely identify the combination of the source key and part_id. + jira: "AID-410" + addresses: "" + type: Enhancement + impact: High + +- relnote: Change output column for `chunk_text()` primitive function. + details: | + The enumeration column returned by the `chunk_text()` primitive function is now `part_id` instead of `chunk_id` to match the other Preparer primitives/operations. + jira: "AID-410" + addresses: "" + type: Enhancement + impact: Low + +- relnote: Batch processing for embeddings with external models. + details: | + The external model providers `embeddings`, `openai_embeddings`, and `nim_embeddings` can now send a batch of inputs in a single request, rather than multiple concurrent requests. + This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled. + jira: "AID-419" + addresses: "" + type: Enhancement + impact: Medium diff --git a/install_template/templates/products/postgres-enterprise-manager-server/base.njk b/install_template/templates/products/postgres-enterprise-manager-server/base.njk index 2d70938e492..cf262425e2d 100644 --- a/install_template/templates/products/postgres-enterprise-manager-server/base.njk +++ b/install_template/templates/products/postgres-enterprise-manager-server/base.njk @@ -93,7 +93,7 @@ For more details, see [Configuring the PEM server on Linux](../configuring_the_p !!! Note - - The operating system user pem is created while installing the PEM server. The pem application data and the session is saved to this user's home directory. + - The operating system user pem is created while installing the PEM server. The PEM application data and the session is saved to this user's home directory. ## Supported locales diff --git a/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx b/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx index 77270721298..e6bd4e446ca 100644 --- a/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx +++ b/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx @@ -142,8 +142,8 @@ Use the properties in the `efm.properties` file to specify connection, administr | [log.dir](#log_dir) | | | | If not specified, defaults to '/var/log/efm-<version>' | | [syslog.host](#syslog_logging) | | | localhost | | | [syslog.port](#syslog_logging) | | | 514 | | -| [syslog.protocol](#syslog_logging) | | | | | -| [syslog.facility](#syslog_logging) | | | UDP | | +| [syslog.protocol](#syslog_logging) | | | UDP | | +| [syslog.facility](#syslog_logging) | | | LOCAL1 | | | [file.log.enabled](#logtype_enabled) | Y | Y | true | | | [syslog.enabled](#logtype_enabled) | Y | Y | false | | | [jgroups.loglevel](#loglevel) | | | info | | diff --git a/product_docs/docs/efm/5/efm_quick_start/index.mdx b/product_docs/docs/efm/5/efm_quick_start/index.mdx index 6002efe4fb7..95206902d7d 100644 --- a/product_docs/docs/efm/5/efm_quick_start/index.mdx +++ b/product_docs/docs/efm/5/efm_quick_start/index.mdx @@ -21,7 +21,7 @@ Using EDB Postgres Advanced Server as an example (Failover Manager also works wi - Install Failover Manager on each primary and standby node. During EDB Postgres Advanced Server installation, you configured an EDB repository on each database host. You can use the EDB repository and the `yum install` command to install Failover Manager on each node of the cluster: ```shell - yum install edb-efm49 + yum install edb-efm50 ``` During the installation process, the installer creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres. The example that follows creates a cluster named `efm`. diff --git a/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx b/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx index 0efd15b776d..70e73546fec 100644 --- a/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx +++ b/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx @@ -18,7 +18,7 @@ There are three ways to use Index Advisor to analyze SQL queries: - Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze. -- Access Index Advisor through the Postgres Enterprise Manager client. When accessed via the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see the [Using the SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx). +- Access Index Advisor through the Postgres Enterprise Manager client. When accessed via the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see the [Using SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx). Index Advisor will attempt to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE` and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of a set of queries (if you are providing the command in an SQL file) or an `EXPLAIN` statement (if you are specifying the SQL statement at the psql command line). Index Advisor displays the query plan and estimated execution cost for the supplied query, but does not actually execute the query. diff --git a/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx b/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx index fcc282d6b6b..b786a3510e3 100644 --- a/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx +++ b/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx @@ -16,7 +16,7 @@ You can use Index Advisor to analyze SQL queries in any of these ways: - Invoke the Index Advisor utility program, supplying a text file containing the SQL queries that you want to analyze. Index Advisor generates a text file with `CREATE INDEX` statements for the recommended indexes. - Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze. -- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see [Using the SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx). +- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see [Using SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx). Index Advisor attempts to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE`, and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of either: - If you're providing the command in an SQL file, a set of queries diff --git a/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx b/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx index 070db8d1eeb..6aae0ed585c 100644 --- a/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx +++ b/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx @@ -9,7 +9,7 @@ You can use Index Advisor to analyze SQL queries in any of these ways: - Invoke the Index Advisor utility program, supplying a text file containing the SQL queries that you want to analyze. Index Advisor generates a text file with `CREATE INDEX` statements for the recommended indexes. - Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze. -- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler and Index Advisor with PEM, see [Using the SQL profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx). +- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler and Index Advisor with PEM, see [Using SQL profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx). Index Advisor attempts to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE`, and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of either: diff --git a/product_docs/docs/pem/10/certificates/index.mdx b/product_docs/docs/pem/10/certificates/index.mdx index 80376810b81..24f35c88753 100644 --- a/product_docs/docs/pem/10/certificates/index.mdx +++ b/product_docs/docs/pem/10/certificates/index.mdx @@ -18,10 +18,10 @@ PEM uses SSL certificates: - To secure requests to the [web server](#web-server-certificates), which provides the user interface and REST API. - To secure and authenticate the [PEM agent connections to the PEM backend database](#pem-backend-database-server-and-agent-connection-certificates). -## Web-server certificates +## Web server certificates PEM generates an SSL certificate and key file for the web server during initial configuration. -Because the certificate is self-signed, users will see a warning that the site is insecure when they open the PEM web application URL in their browser. +Because the certificate is self-signed, a warning states that the site is insecure when users open the PEM web application URL in a browser. To increase security and remove this warning, you can replace the self-signed SSL certificate with a certificate signed by a trusted certificate authority. @@ -37,13 +37,13 @@ Change the server name and file paths in the configuration file to match your ce ```text server { # lines omitted here - server_name yourdomain.com; + server_name ; # lines omitted here } server { # lines omitted here - server_name yourdomain.com; + server_name ; ssl_certificate /path/to/your_domain_name.crt ssl_certificate_key /path/to/your_private.key @@ -70,12 +70,12 @@ For a worked example, see [Replacing httpd self-signed SSL certificates](https:/ ## PEM backend database server and agent connection certificates PEM implements secured SSL/TLS connections between PEM agents and the backend database. -Each agent has an SSL certificate which is used both to encrypt its communication with the server and to authenticate with the server in place of a password. +Each agent has an SSL certificate that's used both to encrypt its communication with the server and to authenticate with the server in place of a password. -PEM uses the sslutils extension to allow the PEM server to generate and sign SSL certificates and keys. When a new agent is registered, the PEM server automatically issues it with a certificate. +PEM uses the sslutils extension to allow the PEM server to generate and sign SSL certificates and keys. When a new agent is registered, the PEM server issues it a certificate. Certificates issued by the PEM server are signed by the PEM server, meaning the PEM server is acting as a certificate authority (CA). -If the above is not suitable, you can use SSL certificates and keys generated outside of PEM and signed by a trusted CA. +If this approach isn't suitable, you can use SSL certificates and keys generated outside of PEM and signed by a trusted CA. For more information, see [Trusted CA certificates and keys](#use-certificates-and-keys-signed-by-trusted-ca). ### Certificates and key files on the PEM server @@ -90,7 +90,7 @@ During initial configuration of the PEM server, the following files are generate - `server.key` The `ca_certificate.crt` and `ca_key.key` files are used by the PEM server to sign certificates generated for agents during agent registration. -They are also used to sign `server.crt`. Unless replaced manually, the 'ca_certificate.crt' file is a self-signed certificate because is acting as the root CA. +They're also used to sign `server.crt`. Unless replaced manually, the 'ca_certificate.crt' file is a self-signed certificate because it's acting as the root CA. The `root.crt` file is a copy of the `ca_certificate.crt` file. The `ssl_ca_file` parameter in the `postgresql.conf` file points to this file. @@ -100,33 +100,33 @@ The `ssl_crl_file` parameter in the `postgresql.conf` file points to this file. The `server.crt` file is the signed certificate for the PEM server, and the `server.key` file is the private key to the certificate. The `ssl_cert_file` parameter in the `postgresql.conf` file points to this file. -These files are automatically renewed when they near their expiry date, see [PEM CA certificate renewal](#pem-certificate-renewal). +These files are automatically renewed when they near their expiry date. See [PEM CA certificate renewal](#pem-certificate-renewal). ### Certificates and key files for PEM agents Each agent's SSL certificate and keys are generated during [agent registration](../registering_agent). The PEM agent connects to the PEM backend database server using the libpq interface, acting as a client of the backend database server. -The PEM agent connect to the server using the `cert` auth method and with ssl enabled. -This means that the connection is encrypted using the agent's key and authenticated using the agent's certificate (rather than a password, for example). +The PEM agent connects to the server using the `cert` auth method and with ssl enabled. +This means that the connection is encrypted using the agent's key and authenticated using the agent's certificate instead of, for example, a password. Each agent has a unique identifier, and the agent certificates and keys have the corresponding identifier. -If required, you can use the same certificate for all agents rather than one certificate per agent. For more information, see [Generate common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair). +If required, you can use the same certificate for all agents rather than one certificate per agent. For more information, see [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair). -For more information on using the SSL certificates to connect in Postgres, see [Securing TCP/IP connections with SSL](https://www.postgresql.org/docs/current/ssl-tcp.html). +For more information on using the SSL certificates to connect in Postgres, see [Securing TCP/IP connections with SSL](https://www.postgresql.org/docs/current/ssl-tcp.html) in the Postgres documentation. ### PEM certificate renewal -SSL certificates have an expiry date. If you are using certificates and keys generated by PEM, they are automatically replaced before expiring. +SSL certificates have an expiry date. If you're using certificates and keys generated by PEM, PEM replaces them before they expire. The PEM agent installed with the PEM server monitors the expiration date of the `ca_certificate.crt` file. When the certificate is about to expire, PEM: -- Makes a backup of the existing certificate files -- Creates new certificate files and appends the new CA certificate file to the `root.crt` file on the PEM server -- Creates a job to renew the certificate file for any active agents -- Restarts the PEM server +- Makes a backup of the existing certificate files. +- Creates new certificate files and appends the new CA certificate file to the `root.crt` file on the PEM server. +- Creates a job to renew the certificate file for any active agents. +- Restarts the PEM server. !!! Important -If you choose to either provide your own certificates, or use a single certificate for all agents, you should disable the automatic renewal job. +If you choose to provide your own certificates or use a single certificate for all agents, disable the automatic renewal job. On the PEM server, execute the following SQL: ```sql @@ -136,7 +136,7 @@ WHERE jobname = 'Check CA certificate expiry'; ``` !!! -If you need to regenerate the server or agent certificates manually, please see: +If you need to regenerate the server or agent certificates manually, see: - [Regenerating the server SSL certificates](replacing_ssl_certificates) - [Regenerating agent SSL certificates](regenerating_agent_certificates) @@ -146,7 +146,7 @@ By creating and using a single Postgres user for all PEM agents rather than one Create a user, generate an agent certificate and key pair, and use them for all PEM agents. -1. Create one common agent user in the PEM backend database. Grant the `pem_agent` role to the user. +1. Create one common agent user in the PEM backend database. Grant the pem_agent role to the user. ```shell # Running as enterprisedb @@ -176,7 +176,7 @@ Create a user, generate an agent certificate and key pair, and use them for all openssl x509 -req -days 365 -in agent.csr -CA ca_certificate.crt -CAkey ca_key.key -CAcreateserial -out agent.crt ``` -1. Change the permissions on the `agent.crt` and `agent.key` file: +1. Change the permissions on the `agent.crt` and `agent.key` files: ```shell chmod 600 agent.crt agent.key @@ -209,7 +209,7 @@ Create a user, generate an agent certificate and key pair, and use them for all - To replace the agent certificate and key pair with the registered agent. - a. Edit the `agent_user`, `agent_ssl_key`, and `agent_ssl_crt` parameters in `agent.cfg` file of the agent host: + a. Edit the `agent_user`, `agent_ssl_key`, and `agent_ssl_crt` parameters in the `agent.cfg` file of the agent host: ```shell vi /usr/edb/pem/agent/etc/agent.cfg @@ -262,7 +262,7 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl 1. Ask your CA to sign the CSR and generate the server certificate for you. -1. Verify the details of the new server certificate aren't tampered with and match your provided details: +1. Verify that the details of the new server certificate aren't tampered with and match your provided details: ```shell openssl x509 -noout -text -in server.crt @@ -277,8 +277,8 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl 1. If the trusted CA doesn't provide CRL, disable CRL usage by the server. To disable the CRL usage, comment the `ssl_crl_file` parameter in the `postgresql.conf` file. !!! Note - If you accidentally leave a CRL from a previous CA in place and do not comment out `ssl_crl_file`, the server will start but authentication will fail with an SSL error message `tlsv1 alert unknown ca`. - The error doesn't specify that the CRL is the cause, so this can be difficult to debug if encountered out of context. + If you leave a CRL from a previous CA in place and don't comment out `ssl_crl_file`, the server will start. However, authentication will fail with an SSL error message: `tlsv1 alert unknown ca`. + The error doesn't specify that the CRL is the cause, so this issue can be difficult to debug if encountered out of context. 1. Copy the new `root.crt`, `server.key`, and `server.crt` files to the data directory of the backend database server: @@ -286,7 +286,7 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl cp root.crt server.key server.crt /var/lib/edb/as/data ``` -1. Change the owner and permissions of the new certificates and key files to be the same as the data directory: +1. Change the owner and permissions of the new certificates and key files to the same name as the data directory: ```shell cd /var/lib/edb/as/data/ @@ -369,7 +369,7 @@ Replace the agent SSL certificates only after replacing the server certificates Use the Services applet to restart the PEM agent. The PEM agent service is named Postgres Enterprise Manager Agent. Select the service name in the Services dialog box, and select **Restart the service**. !!! Note -For agents registered after following the process above you can provide a certificate to the agent at the time of registration as shown in the [second example](/pem/latest/registering_agent/#overriding-default-configurations---examples). +For agents registered after following the preceding process, you can provide a certificate to the agent at the time of registration as shown in the [second example](/pem/latest/registering_agent/#overriding-default-configurations---examples). !!! !!!note @@ -393,7 +393,7 @@ This command returns `agent1.crt: OK` on success or an explanatory message on fa ### Make a test connection to the PEM backend database -To verify whether the agent user can connect using a certificate, on the server where the agent is located, execute the following commands as root: +To verify whether the agent user can connect using a certificate, as root on the server where the agent is located, execute: ```shell PGHOST= @@ -407,6 +407,7 @@ export PGHOST PGPORT PGUSER PGSSLCERT PGSSLKEY PGSSLMODE -A -t -c "SELECT version()" ``` + Where: - `` is the full path to the psql executable, for example `/usr/edb/as15/bin/psql`. - `` is the hostname or IP address of PEM server. @@ -414,7 +415,7 @@ Where: - `` is the ID of the agent you're testing, as defined in the file `/usr/edb/pem/agent/etc/agent.cfg`. !!! Note -If you used the instructions in [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair) +If you used the instructions in [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair), you must set `PGUSER` to the common agent username. !!! diff --git a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx index aa6bf4c33a8..481229cd7e3 100644 --- a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx +++ b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx @@ -6,17 +6,17 @@ redirects: --- !!! Important -These steps are automatically performed by default when the certificates are nearing expiry. -These instructions are provided for completeness incase you need to manually regenerate the PEM certificates and keys. +PEM performs these steps by default when the certificates are nearing expiry. +These instructions are provided for completeness in case you need to manually regenerate the PEM certificates and keys. !!! You need to regenerate the agent certificates and key files: -- If the PEM server certificates are regenerated -- If the PEM agent certificates are near expiring +- If the PEM server certificates are regenerated. +- If the PEM agent certificates are near expiring. You must regenerate a certificate and a key for each agent interacting with the PEM server and copy it to the agent. -Each agent has a unique identifier that's stored in the pem.agent table of the pem database. You must replace the certificate and key files with the certificate or key files that corresponds to the agent's identifier. +Each agent has a unique identifier that's stored in the pem.agent table of the `pem` database. You must replace the certificate and key files with the certificate or key files that correspond to the agent's identifier. Prerequisites: - PEM server has certificates. @@ -66,9 +66,9 @@ To generate a PEM agent certificate and key file pair: Where `-req` indicates the input is a CSR. The `-CA` and `-CAkey` options specify the root certificate and private key to use for signing the CSR. - Before generating the next certificate and key file pair, move the `agent.key` and `agent.crt` files generated in the steps 2 and 4 on their respective PEM agent host. + Before generating the next certificate and key-file pair, move the `agent.key` and `agent.crt` files generated in steps 2 and 4 on their respective PEM agent host. -6. Change the permission on the new `agent.crt` and `agent.key` file: +6. Change the permission on the new `agent.crt` and `agent.key` files: ```shell chmod 600 agent.crt agent.key diff --git a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx index 13fc858973f..7331704b016 100644 --- a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx +++ b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx @@ -8,8 +8,8 @@ redirects: If the PEM backend database server certificates are near expiring, plan to regenerate the certificates and key files. !!! Important -By default, these steps are performed automatically when the certificates are nearing expiry. -These instructions are provided for completeness if incase you need to manually regenerate the PEM certificates and keys. +PEM performs these steps by default when the certificates are nearing expiry. +These instructions are provided for completeness in case you need to manually regenerate the PEM certificates and keys. !!! To replace the SSL certificates: @@ -91,7 +91,7 @@ To replace the SSL certificates: openssl genrsa -out server.key 4096 ``` -1. Move the `server.key` to the data directory of the backend server, and change the ownership and permissions: +1. Move `server.key` to the data directory of the backend server, and change the ownership and permissions: ```shell mv server.key /var/lib/edb/as/data @@ -105,9 +105,9 @@ To replace the SSL certificates: openssl req -new -key server.key -out server.csr -subj '/C=IN/ST=MH/L=Pune/O=EDB/CN=PEM' ``` - Where `-subj` is provided as per your requirements. You define `CN` asthe hostname/domain name of the PEM server host. + Where `-subj` is provided as per your requirements. You define `CN` as the hostname/domain name of the PEM server host. -1. Use the `openssl x509` command to sign the CSR and generate a server certificate. Move the `server.crt` to the data directory of the backend database server: +1. Use the `openssl x509` command to sign the CSR and generate a server certificate. Move `server.crt` to the data directory of the backend database server: ```shell openssl x509 -req -days 365 -in server.csr -CA ca_certificate.crt -CAkey ca_key.key -CAcreateserial -out server.crt @@ -132,4 +132,3 @@ To replace the SSL certificates: Restarting the backend database server restarts the PEM server. 1. Regenerate each PEM agent's SSL certificates. For more information, see [Regenerating agent SSL certificates](regenerating_agent_certificates). - diff --git a/product_docs/docs/pem/10/changing_default_port.mdx b/product_docs/docs/pem/10/changing_default_port.mdx index 2476c46ac43..1e291056d7e 100644 --- a/product_docs/docs/pem/10/changing_default_port.mdx +++ b/product_docs/docs/pem/10/changing_default_port.mdx @@ -2,7 +2,7 @@ title: "Changing the default port" --- -By default, the 8443 port is assigned for the web services at the time of configuration of the PEM server. +By default, the 8443 port is assigned for the web services when the PEM server is configured. You can change the port after configuration by changing a few parameters in the web server configuration files. The names and locations of these files are platform specific. diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx index 713c93d333f..59d6fcbf4b2 100644 --- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx +++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx @@ -8,7 +8,7 @@ redirects: --- -PEM supports two methods for 2FA: +PEM supports two methods for two-factor authentication (2FA): - Email authentication - Authenticator app (such as Google Authenticator) @@ -17,7 +17,7 @@ To enable 2FA, you can copy these settings from the `config.py` file to the `con | Parameter | Description | | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| MFA_ENABLED | Set to `true` to enable the two-factor authentication. Default value is `false`. | +| MFA_ENABLED | Set to `true` to enable two-factor authentication. Default value is `false`. | | MFA_FORCE_REGISTRATION | Set to `true` to ask the users to register forcefully for the two-factor authentication methods at login. Default value is `false`. | | MFA_SUPPORTED_METHODS | Set to `email` to use the email authentication method (send a one-time code by email) or `authenticator` to use the TOTP-based application authentication method. | | MFA_EMAIL_SUBJECT | Set to the subject of the email for email authentication. Default value is ` - Verification Code`. | @@ -28,7 +28,7 @@ To use the email authentication method, you need to configure mail server settin PEM server can send an email using either the SMTP configurations saved in the PEM configuration or using Flask-Mail. -To send the email verification code using the internal SMTP configuration from the PEM configuration, set the parameter `MAIL_USE_PEM_INTERNAL` to `True`. If set to `False`, the following mail configuration is used to send the code on the user-specified email address: +To send the email verification code using the internal SMTP configuration from the PEM configuration, set the parameter `MAIL_USE_PEM_INTERNAL` to `True`. If set to `False`, the following mail configuration is used to send the code to the user-specified email address: - MAIL_SERVER = 'localhost' - MAIL_PORT = 25 diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx index 8b296f9eb25..57ef712c60e 100644 --- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx +++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx @@ -13,17 +13,17 @@ You can configure Kerberos authentication for the PEM server. The Kerberos serve - PEM server (PEM web server and PEM backend database server) - Client machine -For example, if the realm on Kerberos server is `edbpem.org`, then you can set the Kerberos server hostname to `Krb5server.edbpem.org`, the PEM server hostname to `pem.edbpem.org`, and the client's hostname to `pg12.edbpem.org`.The convention is to use the DNS domain name as the name of the realm. +For example, if the realm on Kerberos server is `edbpem.org`, then you can set the Kerberos server hostname to `Krb5server.edbpem.org`, the PEM server hostname to `pem.edbpem.org`, and the client's hostname to `pg12.edbpem.org`. The convention is to use the DNS domain name as the name of the realm. ## 1. Install Kerberos, the PEM server, and the PEM backend database -Install Kerberos on the machine that functions as the authentication server. Install the PEM server on a separate machine. For more information, see [Installing the PEM Server](../../installing/). +Install Kerberos on the machine that functions as the authentication server. Install the PEM server on a separate machine. For more information, see [Installing the PEM server](../../installing/). -Install the PEM backend database (Postgres/EDB Postgres Advanced Server) on the same machine as the PEM server or on a different one. For more information, see the Installation steps on [EDB Docs website](https://www.enterprisedb.com/docs). +Install the PEM backend database (Postgres/EDB Postgres Advanced Server) on the same machine as the PEM server or on a different one. For more information, see the installation steps on [EDB Docs website](https://www.enterprisedb.com/docs). ## 2. Add principals on Kerberos server -Add the principals for the PEM web application deployed under an Apache web server (HTTPD/Apache2) and the PEM Backend Database Server (PostgreSQL/EDB Postgres Advanced Server). +Add the principals for the PEM web application deployed under an Apache web server (HTTPD/Apache2) and the PEM backend database server (PostgreSQL/EDB Postgres Advanced Server). ```shell $ sudo kadmin.local -q "addprinc -randkey HTTP/" @@ -109,7 +109,7 @@ Restart the database server to reflect the changes: systemctl restart ``` -`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platforms. +`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platform. ## 5. Obtain and view the initial ticket @@ -125,10 +125,10 @@ $ kinit $ klist ``` -It displays the principal along with the Kerberos ticket. +These commands display the principal along with the Kerberos ticket. !!! Note - The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privilege on `pem` database. + The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privilege on the `pem` database. ## 6. Configure the PEM server @@ -158,13 +158,13 @@ If the PEM server uses Kerberos authentication: - All the authenticated user principals are appended with the realm (USERNAME@REALM) and passed as the database user name by default. To override the default, in the `config_local.py` file, add the parameter `PEM_USER_KRB_INCLUDE_REALM` and set it to `False`. -- Restart the Apache server +- Restart the Apache server: ```shell sudo systemctl restart ``` -- Edit the entries at the top of `pg_hba.conf` to use the gss authentication method, and reload the database server. +- Edit the entries at the top of `pg_hba.conf` to use the gss authentication method, and reload the database server: ```shell host pem +pem_user /32 gss @@ -178,25 +178,25 @@ If the PEM server uses Kerberos authentication: `POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platforms. !!! Note - If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, then you can specify connection type as `hostgssenc` to allow only gss-encrypted connection. + If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, you can specify the connection type as `hostgssenc` to allow only gss-encrypted connections. ## 7. Browser settings -Configure the browser on the client machine to access the PEM web client to use the Spnego/Kerberos. +Configure the browser on the client machine to access the PEM web client to use Spnego/Kerberos. For Mozilla Firefox: 1. Open the low-level Firefox configuration page by loading the `about:config` page. 1. In the search box, enter `network.negotiate-auth.trusted-uris`. -1. Double-click the `network.negotiate-auth.trusted-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with a comma. +1. Double-click the `network.negotiate-auth.trusted-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with commas. 1. In the search box, enter `network.negotiate-auth.delegation-uris`. -1. Double-click the `network.negotiate-auth.delegation-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with a comma. +1. Double-click the `network.negotiate-auth.delegation-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with commas. 1. Select **OK**. For Google Chrome on Linux or MacOS: -- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, run the `google-chrome` command as follows: +- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, use this `google-chrome` command: ```ini google-chrome --auth-server-whitelist = "hostname/domain" @@ -215,4 +215,4 @@ For Google Chrome on Linux or MacOS: `psql: GSSAPI continuation error: Unspecified GSS failure. Minor code may provide more information` `GSSAPI continuation error: Key version is not available` - Add encryption types to the keytab using ktutil or by recreating the Postgres keytab with all crypto systems from AD. \ No newline at end of file + Add encryption types to the keytab using ktutil or by re-creating the Postgres keytab with all crypto systems from AD. \ No newline at end of file diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx index c7b270895d4..f7068da7f72 100644 --- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx +++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx @@ -5,7 +5,7 @@ redirects: - /pem/latest/pem_inst_guide_linux/04_installing_postgres_enterprise_manager/07_configuring_the_pem_server_to_use_windows_kerberos_server/ --- -The Windows Active Directory domain service works with hostnames and not with IP addresses. To use single sign-on in PEM Server using Active Directory domain services, configure the following machines with hostnames using the DNS: +The Windows Active Directory domain service works with hostnames and not with IP addresses. To use single sign-on in PEM server using Active Directory domain services, configure the following machines with hostnames using the DNS: - Windows server (domain controller) - PEM server (PEM web server and PEM backend database server) @@ -33,7 +33,7 @@ Create users in Active Directory of the Windows server to map with the HTTP serv 1. Enter the user details. -1. Give the password and make sure to clear **User must change password at next logon**. Also select **User cannot change password** and **Password never expires**. +1. Enter the password and make sure to clear **User must change password at next logon**. Also select **User cannot change password** and **Password never expires**. 1. Review the user details. @@ -41,7 +41,7 @@ Create users in Active Directory of the Windows server to map with the HTTP serv ![PEM Server Web Properties](../../images/pem_server_web_properties_member_of.png) -1. Create the user (for example, pemserverdb) in Active Cirectory of the Windows server to map with the Postgres service principal for the PEM backend database. +1. Create the user (for example, pemserverdb) in Active Directory on the Windows server to map with the Postgres service principal for the PEM backend database. ## 3. Extract key tables from Active Directory @@ -98,7 +98,7 @@ Extract the key tables for the service principals and map them with the respecti ## 4. Configure the PEM backend database server -Add the key table location in the `postgresql.conf` file. +Add the key table location in the `postgresql.conf` file: ```shell krb_server_keyfile='FILE://pemdb.keytab' @@ -147,7 +147,7 @@ $ kinit $ klist ``` -It displays the principal along with the Kerberos ticket. +These commands display the principal along with the Kerberos ticket. !!! Note The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privileges on the `pem` database. @@ -160,14 +160,14 @@ Run the PEM configure script on the PEM server to use Kerberos authentication: $ sudo PEM_APP_HOST=pem.edbpem.internal PEM_KRB_KTNAME=/bin/configure-pem-server.sh ``` -In the `config_setup.py` file, configure `PEM_DB_HOST` and check that the value of `PEM_AUTH_METHOD` is set to `'kerberos'`. +In the `config_setup.py` file, configure `PEM_DB_HOST` and check that the value of `PEM_AUTH_METHOD` is set to `'kerberos'`: ```shell $ sudo vim /share/web/config_setup.py PEM_DB_HOST=`pem.edbpem.internal` ``` -Configure `HOST` in the `.install-config` file. +Configure `HOST` in the `.install-config` file: ```shell $ sudo vim /share/.install-config @@ -186,7 +186,7 @@ Restart the Apache server: sudo systemctl restart ``` -Edit the entries at the top in `pg_hba.conf` to use the gss authentication method. Then reload the database server. +Edit the entries at the top in `pg_hba.conf` to use the gss authentication method. Then reload the database server: ```shell host pem +pem_user /32 gss @@ -200,11 +200,11 @@ Edit the entries at the top in `pg_hba.conf` to use the gss authentication metho `POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on RHEL or Rocky Linux platforms. !!! Note - You can't specify the connection type as `hostgssenc`. Windows doesn't support gss encrypted connection. + You can't specify the connection type as `hostgssenc`. Windows doesn't support gss-encrypted connections. ## 7. Browser settings -Configure the browser on the client machine to access the PEM web client to use the Spnego/Kerberos. +Configure the browser on the client machine to access the PEM web client to use Spnego/Kerberos. For Mozilla Firefox: @@ -217,7 +217,7 @@ For Mozilla Firefox: For Google Chrome on Linux or MacOS: -- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, run the `google-chrome` command as follows: +- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, use this `google-chrome` command: ```ini google-chrome --auth-server-whitelist = "hostname/domain" @@ -236,4 +236,4 @@ For Google Chrome on Linux or MacOS: `psql: GSSAPI continuation error: Unspecified GSS failure. Minor code may provide more information` `GSSAPI continuation error: Key version is not available` - Add encryption types to the keytab using ktutil or by recreating the Postgres keytab with all crypto systems from AD. + Add encryption types to the keytab using ktutil or by re-creating the Postgres keytab with all crypto systems from AD. diff --git a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx index f10d1793733..7421afdb6d9 100644 --- a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx +++ b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx @@ -9,7 +9,7 @@ navigation: --- -PEM also supports Kerberos and 2FA authentication. For implementation instructions, see: +PEM supports Kerberos and two-factor authentication. For implementation instructions, see: On Linux: diff --git a/product_docs/docs/pem/10/considerations/index.mdx b/product_docs/docs/pem/10/considerations/index.mdx index 8cc82f17176..92f5d7726cb 100644 --- a/product_docs/docs/pem/10/considerations/index.mdx +++ b/product_docs/docs/pem/10/considerations/index.mdx @@ -9,13 +9,12 @@ navigation: - installing_pem_server_and_apache_web_server_preferences --- -There are a number of things to consider before deploying Postgres Enterprise Manager. +Before deploying Postgres Enterprise Manager, consider these factors. | Considerations | Implementation instructions | | ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | Is a standalone server sufficient or do you need a high availability architecture? | [Installing the server](../installing/) or [Deploying high availability](ha_pem/) | | Do you need to implement connection pooling? | [Deploying connection pooling](pem_pgbouncer/) | -| What type of authentication to use? | [Authentication options](authentication_options/) | +| What type of authentication should you use? | [Authentication options](authentication_options/) | | What actions should you take to avoid security vulnerabilities? | [Securing your deployment](pem_security_best_practices/) | -| Where to host the web server? | [Web server installation options](installing_pem_server_and_apache_web_server_preferences) | - +| Where should you host the web server? | [Web server installation options](installing_pem_server_and_apache_web_server_preferences) | diff --git a/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx b/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx index 5cce3fe4d48..ab2bfa6d9f8 100644 --- a/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx +++ b/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx @@ -9,19 +9,19 @@ redirects: --- -During the PEM server installation, you can specify your hosting preferences for the web server. +While installing the PEM server, you can specify your hosting preferences for the web server. For production environments, best practice is to have the PEM server and web server on separate hosts. ## PEM server and web server on separate hosts 1. Install the PEM server on both the hosts. See [Installing the PEM server](../installing/). -2. Configure the PEM server host by selecting the **Database** option on the first host. -3. Configure a web server by selecting the **Web Services** option on the second host. +1. Configure the PEM server host by selecting the **Database** option on the first host. +1. Configure a web server by selecting the **Web Services** option on the second host. For more information about configuring a PEM server, see [Configuring the PEM server on Linux platforms](../installing/configuring_the_pem_server_on_linux/). ## PEM server and web server on the same host 1. Install the PEM server. See [Installing the PEM server](../installing/). -2. Run the configuration script. Select the **Web Services and Database** option to install the PEM server and web server on the same host. See [Configuring the PEM server on Linux](../installing/configuring_the_pem_server_on_linux/). +1. Run the configuration script. To install the PEM server and web server on the same host, select the **Web Services and Database** option. See [Configuring the PEM server on Linux](../installing/configuring_the_pem_server_on_linux/). diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx index 449f0b092a1..8d1943ee306 100644 --- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx +++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx @@ -98,10 +98,10 @@ If you're running community PgBouncer, replace the names of the directories, fil ``` !!!note - For more information on `auth_user` see [Authentication settings](https://www.pgbouncer.org/config.html#authentication-settings). + For more information on `auth_user`, see [Authentication settings](https://www.pgbouncer.org/config.html#authentication-settings). !!! -1. Create an HBA file `(/etc/edb/pgbouncer<1.x>/hba_file)` for PgBouncer that contains the following content: +1. Create an HBA file (`/etc/edb/pgbouncer<1.x>/hba_file`) for PgBouncer that contains the following content: ```ini # Use the authentication method scram-sha-256 for local connections diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx index 1208d029130..b2ca47e13ef 100644 --- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx +++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx @@ -19,9 +19,9 @@ navigation: You can use PgBouncer as a connection pooler for limiting the number of connections from the PEM agent to the Postgres Enterprise Manager (PEM) server on non-Windows machines: -- [PEM server and agent connection management mechanism](pem_server_pem_agent_connection_management_mechanism) provides an introduction of the PgBouncer-PEM infrastructure. -- [Preparing the PEM database server](preparing_the_pem_database_server) provides information about preparing the PEM database server to be used with PgBouncer. +- [PEM server and agent connection management mechanism](pem_server_pem_agent_connection_management_mechanism) is an introduction to the PgBouncer-PEM infrastructure. +- [Preparing the PEM database server](preparing_the_pem_database_server) provides information about preparing the PEM database server to use with PgBouncer. - [Configuring PgBouncer](configuring_pgBouncer) provides detailed information about configuring PgBouncer to allow it to work with the PEM database server. - [Configuring the PEM agent](configuring_the_pem_agent) provides detailed information about configuring a PEM agent to connect to PgBouncer. -For detailed information about using the PEM web interface, see the [Accessing the web interface](../../pem_web_interface). \ No newline at end of file +For detailed information about using the PEM web interface, see [Accessing the web interface](../../pem_web_interface). \ No newline at end of file diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx index d5ccc809c33..4fa7b618ebe 100644 --- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx +++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx @@ -26,7 +26,7 @@ This example shows how to prepare the PEM database server with the enterprisedb ## Creating users and roles for PgBouncer-PEM connections -1. Create a dedicated user named pgbouncer with `pem_agent_pool` membership. This user will serve connections from PgBouncer to the PEM database by forwarding all agent database queries. +1. Create a dedicated user named pgbouncer with pem_agent_pool membership. This user will serve connections from PgBouncer to the PEM database by forwarding all agent database queries. ```sql CREATE ROLE pgbouncer PASSWORD 'ANY_PASSWORD' LOGIN; @@ -84,7 +84,7 @@ This example shows how to prepare the PEM database server with the enterprisedb GRANT ``` -1. Use the `pem.create_proxy_agent_user(varchar)` function to create a user named pem_agent_user1. This proxy user will serve connections between all Agents and PgBouncer. +1. Use the `pem.create_proxy_agent_user(varchar)` function to create a user named pem_agent_user1. This proxy user will serve connections between all agents and PgBouncer. ```sql SELECT pem.create_proxy_agent_user('pem_agent_user1'); @@ -98,9 +98,9 @@ This example shows how to prepare the PEM database server with the enterprisedb ## Updating the configuration files to allow PgBouncer-PEM connections -1. Allow the pgbouncer user to connect to the `pem` database using the SSL authentication method by adding the `hostssl pem` entry in the `pg_hba.conf` file of the PEM database server. +1. Allow the pgbouncer user to connect to the `pem` database using the SSL authentication method. To do so, add the `hostssl pem` entry in the `pg_hba.conf` file of the PEM database server. - In the list of rules, ensure you place the `hostssl pem` entry before any other rules assigned to the `+pem_agent` user. + In the list of rules, be sure to place the `hostssl pem` entry before any other rules assigned to the +pem_agent user. ```shell # Allow the PEM agent proxy user (used by pgbouncer) @@ -149,7 +149,7 @@ This example runs EDB Postgres Advanced Server on RHEL. When setting your enviro
-1. Set the `$USER_HOME` environment variable to the home directory accesible to the user: +1. Set the `$USER_HOME` environment variable to the home directory accessible to the user: ```shell export USER_HOME=/var/lib/edb @@ -187,9 +187,9 @@ This example runs EDB Postgres Advanced Server on RHEL. When setting your enviro openssl x509 -req -days 365 -in pem_agent_pool.csr -CA $DATA_DIR/ca_certificate.crt -CAkey $DATA_DIR/ca_key.key -CAcreateserial -out pem_agent_pool.crt ``` -1. Move the created key and certificate to a path the `enterprisedb` user can access. +1. Move the created key and certificate to a path the enterprisedb user can access. - In this example, create a folder called `~/.postgresql` in the home directory of the `enterprisedb` user and ensure it has permissions: + In this example, create a folder `~/.postgresql` in the home directory of the enterprisedb user and ensure it has permissions: ``` mkdir -p $USER_HOME/.postgresql diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx index 4d2c4f81ce6..668084b9f9d 100644 --- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx +++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx @@ -8,19 +8,19 @@ redirects: - /pem/latest/installing_pem_server/pem_security_best_practices/apache_httpd_security_configuration/ --- -This page details how to secure the PEM web server. +You can configure the security of the PEM web server. On Windows, the supported web server is Apache HTTPD. Apache HTTPD is bundled with PEM under the name PEM HTTPD. -The Apache HTTPD configuration file is `pem.conf` and the SSL configuration file is `httpd-ssl-pem.conf`. Both configuration files are in the `/conf/addons` directory. +The Apache HTTPD configuration file is `pem.conf`, and the SSL configuration file is `httpd-ssl-pem.conf`. Both configuration files are in the `/conf/addons` directory. On Linux, both NGINX and Apache HTTPD are supported. The NGINX configuration file is `/etc/nginx/conf.d/edb-pem.conf` on RHEL-like systems and `/etc/nginx/sites-available/edb-pem.conf` on Debian-like systems. -the Apache HTTPD configuration file is `edb-pem.conf` and the SSL configuration file is `edb-ssl-pem.conf`. Both configurations files are in the `/conf.d` directory. +The Apache HTTPD configuration file is `edb-pem.conf`, and the SSL configuration file is `edb-ssl-pem.conf`. Both configurations files are in the `/conf.d` directory. ## Recommendations applied by default These recommendations are applied by default in new installations of PEM. -If you have customized your web server configuration, or carried it over from a much older version of PEM, you can use this information to verify that your configuration meets current standards. +If you customized your web server configuration or carried it over from a much older version of PEM, you can use this information to verify that your configuration meets current standards. ### Disable insecure SSL and TLS protocols @@ -42,8 +42,8 @@ SSLProtocol -All TLSv1.2 SSLProxyProtocol -All TLSv1.2 ``` -You can verify that TLS 1.1 is disabled using the following command, replacing the URL with that of your web server. -A return value of 35 means TLS 1.1 is disabled whereas 0 means it is enabled. +You can verify that TLS 1.1 is disabled using the following command. Replace the URL with your web server's. +A return value of 35 means TLS 1.1 is disabled. 0 means it's enabled. ```shell curl -k -v -s --tls-max 1.1 https://pem-server:8443 >/dev/null 2>&1; echo $? @@ -51,7 +51,7 @@ curl -k -v -s --tls-max 1.1 https://pem-server:8443 >/dev/null 2>&1; echo $? ### Disable web server information exposure -In new installations of PEM, the web server is configured to minimize the information about the server exposed to clients by disabling server tokens (which expose information about the server in response headers) and server signatures (which expose information in the footers server-generated pages such as error messages). +In new installations of PEM, the web server is configured to minimize the information about the server exposed to clients by disabling server tokens and server signatures. Server tokens expose information about the server in response headers. Server signatures expose information in the footers of server-generated pages such as error messages. For NGINX, PEM adds the following line to the configuration file: @@ -71,11 +71,11 @@ ServerSignature Off The directory listing allows an attacker to view the complete contents of directories from which content is served. This listing might lead to the attacker reverse engineering an application to obtain the source code, analyze it for possible security flaws, and discover more information about an application. -To avoid this risk, PEM disables directory listing +To avoid this risk, PEM disables directory listing. For NGINX, PEM sets `autoindex: off`. -For Apache HTTPD, PEM sets setting the `Options -Indexes` directive: +For Apache HTTPD, PEM sets the `Options -Indexes` directive: ```shell Options -Indexes @@ -86,7 +86,7 @@ For Apache HTTPD, PEM sets setting the `Options -Indexes` directive: The TRACE and TRACK HTTP methods are used for debugging servers. When an HTTP TRACE request is sent to a supported web server, the server responds and echoes the data passed to it, including any HTTP headers. We recommend that you disable these methods in the Apache configuration. In NGINX, TRACK and TRACE methods are disabled by default. In Apache HTTPD, PEM includes the following lines in the configuration file to reject these methods. -Note that some scanners do not understand this syntax, so may incorrectly report that these methods are allowed. +Some scanners don't understand this syntax and may incorrectly report that these methods are allowed. ```shell RewriteEngine on @@ -94,8 +94,8 @@ RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS) RewriteRule .\* - [F] ``` -You can verify that TRACK and TRACE are disabled with the following commands replacing the URL with that of your web server. -A return value of 35 means TLS 1.1 is disabled whereas 0 means it is enabled. If the methods are disabled, the command will return an HTML response including the text `405 Method Not Allowed` or similar. +You can verify that TRACK and TRACE are disabled with the following commands. Replacing the URL with your web server's. +A return value of 35 means TLS 1.1 is disabled. 0 means it's enabled. If the methods are disabled, the command returns an HTML response that includes the text `405 Method Not Allowed` or similar. ```shell curl -kL -X TRACK https://pem-server:8443/pem @@ -105,17 +105,17 @@ curl -kL -X TRACE https://pem-server:8443/pem ## Optimize HTTP headers for security PEM sets various HTTP header options to improve security. -These settings are defined in the `config.py` and and `config_distro.py` files. +These settings are defined in the `config.py` and `config_distro.py` files. This file is located at `/usr/edb/pem/web` on Linux and at `C:\ProgramFiles\edb\pem\server\share\web` on Windows. -If you wish to alter any of these settings, you should not edit these files, but instead create (or edit if it already exists) a file named `config_local.py` in the same location and add your desired settings. -These settings will override those in the `config.py` and and `config_distro.py` files and will not be overwritten during a PEM upgrade. +If you want to alter any of these settings, don't edit these files. Instead create (or edit if it already exists) a file named `config_local.py` in the same location and add your desired settings. +These settings override those in the `config.py` and `config_distro.py` files. They aren't overwritten during a PEM upgrade. For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/). #### X-Frame-Options -X-Frame-Options indicate whether a browser is allowed to render a page in an <iframe> tag. It specifically protects against clickjacking. PEM has a host validation `X_FRAME_OPTIONS` option to prevent these attacks, which you can configure in the `config_local.py` file. The default is: +X-Frame-Options indicates whether a browser is allowed to render a page in an <iframe> tag. It specifically protects against clickjacking. PEM has a host validation `X_FRAME_OPTIONS` option to prevent these attacks, which you can configure in the `config_local.py` file. The default is: ```ini X_FRAME_OPTIONS = "SAMEORIGIN" @@ -123,7 +123,7 @@ X_FRAME_OPTIONS = "SAMEORIGIN" #### Content-Security-Policy -Content-Security-Policy is part of the HTML5 standard. It provides a broader range of protection than the X-Frame-Options header, which it replaces. It is designed so that website authors can whitelist domains. The authors can load resources (like scripts, stylesheets, and fonts) from the whitelisted domains and also from domains that can embed a page. +Content-Security-Policy is part of the HTML5 standard. It provides a broader range of protection than the X-Frame-Options header, which it replaces. It's designed so that website authors can whitelist domains. The authors can load resources (like scripts, stylesheets, and fonts) from the whitelisted domains and also from domains that can embed a page. PEM has a host validation `CONTENT_SECURITY_POLICY` option to prevent attacks, which you can configure in the `config_local.py` file. The default is: @@ -168,7 +168,7 @@ Cookies are small packets of data that a server sends to your browser to store c SESSION_COOKIE_SECURE = True ``` -- SESSION_COOKIE_HTTPONLY — By default, JavaScript can read the content of cookies. The `HTTPOnly` flag prevents scripts from reading the cookie. Instead, the browser uses the cookie only with HTTP or HTTPS requests. Hackers can't exploit XSS vulnerabilities to learn the contents of the cookie. For example, the `sessionId` cookie never requires that it be read with a client-side script. So, you can set the `HTTPOnly` flag for `sessionId` cookies. The default is: +- SESSION_COOKIE_HTTPONLY — By default, JavaScript can read the content of cookies. The `HTTPOnly` flag prevents scripts from reading the cookie. Instead, the browser uses the cookie only with HTTP or HTTPS requests. Hackers can't exploit XSS vulnerabilities to learn the contents of the cookie. For example, the `sessionId` cookie never requires being read with a client-side script. So, you can set the `HTTPOnly` flag for `sessionId` cookies. The default is: ```ini SESSION_COOKIE_HTTPONLY = True @@ -181,38 +181,38 @@ Cookies are small packets of data that a server sends to your browser to store c ``` !!! Note - This option can cause problems when the server deploys in dynamic IP address hosting environments, such as Kubernetes or behind load balancers. In such cases, set this option to `False`. + This option can cause problems when the server deploys in dynamic IP address hosting environments, such as Kubernetes or behind load balancers. In these cases, set this option to `False`. To apply the changes, restart the web server. - For detailed information on `config.py` file, see [Managing Configuration Settings](../../managing_configuration_settings/). + For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/). ## Additional recommendations that can be applied manually -These recommendations are not applied automatically because they require additional information or action specific to the environment in which PEM is deployed. +These recommendations aren't applied automatically because they require additional information or action specific to the environment in which PEM is deployed. ### Secure HTTPD with SSL certificates During PEM configuration, a self-signed certificate is generated to secure traffic between the web server and clients. -To enhance security and to prevent browser warnings that the site is not secure, we recommend that you [replace this certificate with one signed by a trusted certificate authority](../../certificates/index.mdx/#web-server-certificates). +To enhance security and to prevent browser warnings that the site isn't secure, we recommend that you [replace this certificate with one signed by a trusted certificate authority](../../certificates/index.mdx/#web-server-certificates). ### Run the web server from a non-privileged user account -On Linux, PEM utilizes web server packages provided by the OS. Typically, these create a service unit which runs the web server as the root user. +On Linux, PEM uses web server packages provided by the OS. Typically, these create a service unit that runs the web server as the root user. Running the web server as a root user can create a security issue. We recommend that you run the web server as a unique non-privileged user. Doing so helps to secure any other services running during a security breach. !!! Note Variations in WSGI service by platform -PEM runs as a WSGI application. On Linux, when the web server is NGINX, the WSGI application is run by a separate service, `edb-uwsgi`, which runs as the `pem` user. -When the web server is Apache HTTPD, the WSGI application is run by a daemon process which is a child of the Apache HTTPD process. The daemon process is run as the `pem` user. +PEM runs as a WSGI application. On Linux, when the web server is NGINX, the WSGI application is run by a separate service, `edb-uwsgi`, which runs as the pem user. +When the web server is Apache HTTPD, the WSGI application is run by a daemon process that's a child of the Apache HTTPD process. The daemon process is run as the pem user. -On Windows, the `WSGIDaemonProcess` directive and features aren't available so both the web server and the WSGI app run as the system user (the `LocalSystem` account). +On Windows, the `WSGIDaemonProcess` directive and features aren't available, so both the web server and the WSGI app run as the system user (the `LocalSystem` account). !!! ### Restrict the access to a network or IP address -It is good practice to restrict access to the web server to the smallest set of IP addresses compatible with your business needs. -This is most commonly done at the network infrastructure level, for example through firewall configuration, but can also be enforced by the web server. +It's a good practice to restrict access to the web server to the smallest set of IP addresses compatible with your business needs. +This is most commonly done at the network infrastructure level, for example, through firewall configuration, but can also be enforced by the web server. The PEM application configuration file (`/web/config_local.py`) supports an `ALLOWED_HOSTS` configuration parameter for this purpose. For example: diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx index d8235a3a292..d653625c537 100644 --- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx +++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx @@ -14,14 +14,14 @@ navigation: To harden your PEM deployment against attack, consider the following measures: -1. Ensure PEM itself, your operating system, and third party libraries are regularly updated. Without the most recent security patches, your system is vulnerable to cyberattacks. - Please refer to the [Dependencies](../../installing/dependencies.mdx) page to learn more about the system packages used by PEM. +- Ensure PEM, your operating system, and third-party libraries are regularly updated. Without the most recent security patches, your system is vulnerable to cyberattacks. + See [Dependencies](../../installing/dependencies.mdx) to learn more about the system packages used by PEM. -2. Ensure the Postgres instance used as the PEM server is kept up to date and apply [Postgres security best practices](https://info.enterprisedb.com/rs/069-ALB-339/images/Security-best-practices-2020.pdf). +- Ensure the Postgres instance used as the PEM server is kept up to date and apply [Postgres security best practices](https://info.enterprisedb.com/rs/069-ALB-339/images/Security-best-practices-2020.pdf). -3. [Secure the web server](apache_httpd_security_configuration.mdx) +- [Secure the web server](apache_httpd_security_configuration.mdx). -4. Configure the [security settings of the PEM web application](pem_application_configuration.mdx) as appropriate. +- Configure the [security settings of the PEM web application](pem_application_configuration.mdx) as appropriate. diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx index e45a802d116..2bd09fe78f1 100644 --- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx +++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx @@ -9,30 +9,29 @@ redirects: ## Session timeout -Insufficient session expiration by the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. We recommend that you set the inactivity timeout for the web application to a low value to avoid this security issue. +Setting session expiration time too long in the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. To avoid this security issue, we recommend that you set the inactivity timeout for the web application to a low value. -In PEM, you can set the timeout value for a user session. When there's no user activity for a specified duration on the web console, PEM logs out the user from the web console. A PEM administrator can set the length of time for inactivity. This value is for the whole application and not for each user. To configure the timeout duration, modify the `USER_INACTIVITY_TIMEOUT` parameter in the `config_local.py` file, located in the `/web` directory. By default, this functionality is disabled. +In PEM, you can set the timeout value for a user session. When there's no user activity for a specified duration on the web console, PEM logs the user out of the web console. A PEM administrator can set the length of time for inactivity. This value is for the whole application, not for each user. -For example, to specify for an application to log out a user after 15 minutes of inactivity, set: +To configure the timeout duration, modify the `USER_INACTIVITY_TIMEOUT` parameter in the `config_local.py` file in the `/web` directory. By default, this parameter is disabled. Specify the value in seconds. + +For example, to specify for an application to log a user out after 15 minutes of inactivity, set the time as follows: ```ini USER_INACTIVITY_TIMEOUT = 900 ``` -!!! Note - The timeout value is specified in seconds. - -To apply the changes, restart the Apache service. +To apply the change, restart the Apache service. -For detailed information on the `config.py` file, see [Managing Configuration Settings](../../managing_configuration_settings/). +For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/). ## RestAPI header customization -You can customize the RestAPI token headers to meet your requirements. The default values aren't exposed by the `config.py` file. Customize the following headers in the `config_local.py` file: +You can customize the RestAPI token headers to meet your requirements. The default values aren't exposed by the `config.py` file. In the `config_local.py` file, customize the following headers. ### PEM_HEADER_SUBJECT_TOKEN_KEY -This configuration option allows you to change the HTTP header name to get the generated token. By default, when you send a request to create a token, the server response has an `X-Subject-Token` header. This header contains the value of a newly generated token. If you want to customize the header name, then you can update the `config_local.py` file: +This configuration option lets you change the HTTP header name to get the generated token. By default, when you send a request to create a token, the server response has an `X-Subject-Token` header. This header contains the value of a newly generated token. If you want to customize the header name, then you can update the `config_local.py` file: ```ini PEM_HEADER_SUBJECT_TOKEN_KEY = 'Pem-RestAPI-Generate-Token' @@ -51,13 +50,13 @@ Pem-RestAPI-Generate-Token: 997aef95-d46d-4d84-932a-a80146eaf84f ### PEM_HEADER_TOKEN_KEY -This configuration option allows you to change the HTTP request header name. With this header name, you can send the token to the PEM server. By default, when you send a request to generate a token, the token header name is `X-Auth-Token`. If you want to customize the RestAPI request header name, then you can update the `config_local.py` file: +This configuration option lets you change the header name of the HTTP request. With this header name, you can send the token to the PEM server. By default, when you send a request to generate a token, the token header name is `X-Auth-Token`. If you want to customize the RestAPI request header name, you can update the `config_local.py` file: ```ini PEM_HEADER_TOKEN_KEY = 'Pem-Token' ``` -This setting allows you to send the token: +This setting lets you send the token: ```shell $ curl -Lk -X GET -H "Pem-Token: gw5rzaloxydp91ttd1c97w24b5sv60clic24sxy9" https://localhost:8443/pem/api/v4/agent @@ -65,41 +64,41 @@ $ curl -Lk -X GET -H "Pem-Token: gw5rzaloxydp91ttd1c97w24b5sv60clic24sxy9" https ### PEM_TOKEN_EXPIRY -This configuration option allows you to change the PEM RestAPI token expiry time after it's generated. By default, the token expiry time is set to 20 minutes (1200 seconds). If you want to change the token expiry time to 10 minutes, then you can update the `config_local.py` file: +This configuration option lets you change the PEM RestAPI token expiry time after it's generated. By default, the token expiry time is set to 20 minutes (1200 seconds). For example, to change the token expiry time to 10 minutes, update the `config_local.py` file as follows: ```ini PEM_TOKEN_EXPIRY = 600 ``` -To apply the changes, restart the Apache service. +To apply the change, restart the Apache service. ## Role-based access control in PEM -Role-based access control (RBAC) restricts application access based on a user’s role in an organization and is one of the primary methods for access control. The roles in RBAC refer to the levels of access that users have to the application. Users are allowed to access only the information needed to do their jobs. Roles in PEM are inheritable and additive, rather than subscriptive. In other words, as a PEM admin you need to grant the lowest level role to the user and then grant the roles the user needs to perform their job. For example, to give access only to SQL profiler: +Role-based access control (RBAC) restricts application access based on a user’s role in an organization. It's one of the primary methods for access control. The roles in RBAC refer to the levels of access that users have to the application. Users are allowed to access only the information needed to do their jobs. Roles in PEM are inheritable and additive rather than subscriptive. In other words, as a PEM admin, you need to grant the lowest level role to the user and then grant the roles the user needs to perform their job. For example, to give access only to SQL Profiler: ```sql CREATE ROLE user_sql_profiler WITH LOGIN NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT NOREPLICATION CONNECTION LIMIT -1 PASSWORD 'xxxxxx'; GRANT pem_user, pem_comp_sqlprofiler TO user_sql_profiler; ``` -For detailed information on roles, see [PEM Roles](../../managing_pem_server/#using-pem-predefined-roles-to-manage-access-to-pem-functionality). +For detailed information on roles, see [PEM roles](../../managing_pem_server/#using-pem-predefined-roles-to-manage-access-to-pem-functionality). ## SQL/Protect plugin -Often, preventing an SQL injection attack is the responsibility of the application developer, while the database administrator has little or no control over the potential threat. The difficulty for database administrators is that the application must have access to the data to function properly. +Often, preventing an SQL injection attack is the responsibility of the application developer. The database administrator has little or no control over the potential threat. The difficulty for database administrators is that the application must have access to the data to function properly. SQL/Protect is a module that allows a database administrator to protect a database from SQL injection attacks. SQL/Protect examines incoming queries for typical SQL injection profiles in addition to the standard database security policies. -Attackers can perpetrate SQL injection attacks with several different techniques. A specific signature characterizes each technique. SQL/Protect examines queries for unauthorized relations, utility commands, SQL tautology, and unbounded DML statements. SQL/Protect gives the control back to the database administrator by alerting the administrator to potentially dangerous queries and then blocking those queries. +Attackers can perpetrate SQL injection attacks using several different techniques. A specific signature characterizes each technique. SQL/Protect examines queries for unauthorized relations, utility commands, SQL tautology, and unbounded DML statements. SQL/Protect gives the control back to the database administrator by alerting the administrator to potentially dangerous queries and then blocking those queries. !!! Note - This plugin works only on the EDB Postgres Advanced Server server, so this is useful only when your PEM database is hosted on the EDB Postgres Advanced Server server. + This plugin is useful only when your PEM database is hosted on the EDB Postgres Advanced Server server. It doesn't work on other servers. For detailed information about the SQL Profiler plugin, see [SQL Profiler](../../profiling_workloads/). ## Password management -One security tip for PEM administrative users is to change your PEM login passwords to something new regularly. Changing your password: +One security tip for PEM administrative users is to regularly change your PEM login passwords to something new. Changing your password: - Prevents breaches of multiple accounts - Prevents constant access @@ -110,7 +109,9 @@ One security tip for PEM administrative users is to change your PEM login passwo In most cases, pemAgent is installed as a root user and runs as a daemon process with root privileges. By default, PEM disables running the scheduled jobs/task. PEM provides support for running scheduled jobs as a non-root user by changing the pemAgent configuration file. -To run scheduled jobs as a non-root user, modify the entry for the `batch_script_user` parameter in the `agent.cfg` file and specify the user to run the script. You can either specify a non-root user or root user identity. If you don't specify a user, or the specified user doesn't exist, then the script doesn't execute. Restart the agent after modifying the file. If a non-root user is running `pemagent`, then the value of `batch_script_user` is ignored, and the same non-root user used for running the `pemagent` executes the script. +To run scheduled jobs as a non-root user, modify the entry for the `batch_script_user` parameter in the `agent.cfg` file and specify the user to run the script. You can specify either a non-root user or root user identity. If you don't specify a user or the specified user doesn't exist, the script doesn't execute. + +After modifying the file, restart the agent. If a non-root user is running pemAgent, the value of `batch_script_user` is ignored. The same non-root user used for running the pemAgent executes the script. To invoke a script on a Windows system, set the registry entry for `AllowBatchJobSteps` to `true` and restart the PEM agent. PEM registry entries are located in: diff --git a/src/pages/index.js b/src/pages/index.js index 4b30882af28..0b455c4443e 100644 --- a/src/pages/index.js +++ b/src/pages/index.js @@ -282,7 +282,7 @@ const Page = () => { Get Started with Pipelines - New: AI Accelerator Preparers + AI Accelerator Preparers PGvector