From 7dfa3cd8094309756f57a2bee0eebba23951ca12 Mon Sep 17 00:00:00 2001
From: Betsy Gitelman
Date: Fri, 25 Apr 2025 13:52:48 -0400
Subject: [PATCH 01/42] Edits to PEM 10 - first set
---
.../base.njk | 2 +-
.../02_index_advisor/index.mdx | 2 +-
.../02_index_advisor/index.mdx | 2 +-
.../index_advisor_overview.mdx | 2 +-
.../docs/pem/10/certificates/index.mdx | 61 ++++++++++---------
5 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/install_template/templates/products/postgres-enterprise-manager-server/base.njk b/install_template/templates/products/postgres-enterprise-manager-server/base.njk
index 2d70938e492..cf262425e2d 100644
--- a/install_template/templates/products/postgres-enterprise-manager-server/base.njk
+++ b/install_template/templates/products/postgres-enterprise-manager-server/base.njk
@@ -93,7 +93,7 @@ For more details, see [Configuring the PEM server on Linux](../configuring_the_p
!!! Note
- - The operating system user pem is created while installing the PEM server. The pem application data and the session is saved to this user's home directory.
+ - The operating system user pem is created while installing the PEM server. The PEM application data and the session is saved to this user's home directory.
## Supported locales
diff --git a/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx b/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx
index 0efd15b776d..70e73546fec 100644
--- a/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx
+++ b/product_docs/docs/epas/13/epas_guide/03_database_administration/02_index_advisor/index.mdx
@@ -18,7 +18,7 @@ There are three ways to use Index Advisor to analyze SQL queries:
- Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze.
-- Access Index Advisor through the Postgres Enterprise Manager client. When accessed via the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see the [Using the SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx).
+- Access Index Advisor through the Postgres Enterprise Manager client. When accessed via the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see the [Using SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx).
Index Advisor will attempt to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE` and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of a set of queries (if you are providing the command in an SQL file) or an `EXPLAIN` statement (if you are specifying the SQL statement at the psql command line). Index Advisor displays the query plan and estimated execution cost for the supplied query, but does not actually execute the query.
diff --git a/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx b/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx
index fcc282d6b6b..b786a3510e3 100644
--- a/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx
+++ b/product_docs/docs/epas/14/epas_guide/03_database_administration/02_index_advisor/index.mdx
@@ -16,7 +16,7 @@ You can use Index Advisor to analyze SQL queries in any of these ways:
- Invoke the Index Advisor utility program, supplying a text file containing the SQL queries that you want to analyze. Index Advisor generates a text file with `CREATE INDEX` statements for the recommended indexes.
- Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze.
-- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see [Using the SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx).
+- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler with PEM, see [Using SQL Profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx).
Index Advisor attempts to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE`, and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of either:
- If you're providing the command in an SQL file, a set of queries
diff --git a/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx b/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx
index 070db8d1eeb..6aae0ed585c 100644
--- a/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx
+++ b/product_docs/docs/epas/15/managing_performance/02_index_advisor/index_advisor_overview.mdx
@@ -9,7 +9,7 @@ You can use Index Advisor to analyze SQL queries in any of these ways:
- Invoke the Index Advisor utility program, supplying a text file containing the SQL queries that you want to analyze. Index Advisor generates a text file with `CREATE INDEX` statements for the recommended indexes.
- Provide queries at the EDB-PSQL command line that you want Index Advisor to analyze.
-- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler and Index Advisor with PEM, see [Using the SQL profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using the Index Advisor](03_using_index_advisor.mdx).
+- Access Index Advisor through the Postgres Enterprise Manager (PEM) client. When accessed using the PEM client, Index Advisor works with SQL Profiler, providing indexing recommendations on code captured in SQL traces. For more information about using SQL Profiler and Index Advisor with PEM, see [Using SQL profiler](/pem/latest/profiling_workloads/using_sql_profiler.mdx) and [Using Index Advisor](03_using_index_advisor.mdx).
Index Advisor attempts to make indexing recommendations on `INSERT`, `UPDATE`, `DELETE`, and `SELECT` statements. When invoking Index Advisor, you supply the workload in the form of either:
diff --git a/product_docs/docs/pem/10/certificates/index.mdx b/product_docs/docs/pem/10/certificates/index.mdx
index 80376810b81..24f35c88753 100644
--- a/product_docs/docs/pem/10/certificates/index.mdx
+++ b/product_docs/docs/pem/10/certificates/index.mdx
@@ -18,10 +18,10 @@ PEM uses SSL certificates:
- To secure requests to the [web server](#web-server-certificates), which provides the user interface and REST API.
- To secure and authenticate the [PEM agent connections to the PEM backend database](#pem-backend-database-server-and-agent-connection-certificates).
-## Web-server certificates
+## Web server certificates
PEM generates an SSL certificate and key file for the web server during initial configuration.
-Because the certificate is self-signed, users will see a warning that the site is insecure when they open the PEM web application URL in their browser.
+Because the certificate is self-signed, a warning states that the site is insecure when users open the PEM web application URL in a browser.
To increase security and remove this warning, you can replace the self-signed SSL certificate with a certificate signed by a trusted certificate authority.
@@ -37,13 +37,13 @@ Change the server name and file paths in the configuration file to match your ce
```text
server {
# lines omitted here
- server_name yourdomain.com;
+ server_name ;
# lines omitted here
}
server {
# lines omitted here
- server_name yourdomain.com;
+ server_name ;
ssl_certificate /path/to/your_domain_name.crt
ssl_certificate_key /path/to/your_private.key
@@ -70,12 +70,12 @@ For a worked example, see [Replacing httpd self-signed SSL certificates](https:/
## PEM backend database server and agent connection certificates
PEM implements secured SSL/TLS connections between PEM agents and the backend database.
-Each agent has an SSL certificate which is used both to encrypt its communication with the server and to authenticate with the server in place of a password.
+Each agent has an SSL certificate that's used both to encrypt its communication with the server and to authenticate with the server in place of a password.
-PEM uses the sslutils extension to allow the PEM server to generate and sign SSL certificates and keys. When a new agent is registered, the PEM server automatically issues it with a certificate.
+PEM uses the sslutils extension to allow the PEM server to generate and sign SSL certificates and keys. When a new agent is registered, the PEM server issues it a certificate.
Certificates issued by the PEM server are signed by the PEM server, meaning the PEM server is acting as a certificate authority (CA).
-If the above is not suitable, you can use SSL certificates and keys generated outside of PEM and signed by a trusted CA.
+If this approach isn't suitable, you can use SSL certificates and keys generated outside of PEM and signed by a trusted CA.
For more information, see [Trusted CA certificates and keys](#use-certificates-and-keys-signed-by-trusted-ca).
### Certificates and key files on the PEM server
@@ -90,7 +90,7 @@ During initial configuration of the PEM server, the following files are generate
- `server.key`
The `ca_certificate.crt` and `ca_key.key` files are used by the PEM server to sign certificates generated for agents during agent registration.
-They are also used to sign `server.crt`. Unless replaced manually, the 'ca_certificate.crt' file is a self-signed certificate because is acting as the root CA.
+They're also used to sign `server.crt`. Unless replaced manually, the 'ca_certificate.crt' file is a self-signed certificate because it's acting as the root CA.
The `root.crt` file is a copy of the `ca_certificate.crt` file. The `ssl_ca_file` parameter in the `postgresql.conf` file points to this file.
@@ -100,33 +100,33 @@ The `ssl_crl_file` parameter in the `postgresql.conf` file points to this file.
The `server.crt` file is the signed certificate for the PEM server, and the `server.key` file is the private key to the certificate.
The `ssl_cert_file` parameter in the `postgresql.conf` file points to this file.
-These files are automatically renewed when they near their expiry date, see [PEM CA certificate renewal](#pem-certificate-renewal).
+These files are automatically renewed when they near their expiry date. See [PEM CA certificate renewal](#pem-certificate-renewal).
### Certificates and key files for PEM agents
Each agent's SSL certificate and keys are generated during [agent registration](../registering_agent).
The PEM agent connects to the PEM backend database server using the libpq interface, acting as a client of the backend database server.
-The PEM agent connect to the server using the `cert` auth method and with ssl enabled.
-This means that the connection is encrypted using the agent's key and authenticated using the agent's certificate (rather than a password, for example).
+The PEM agent connects to the server using the `cert` auth method and with ssl enabled.
+This means that the connection is encrypted using the agent's key and authenticated using the agent's certificate instead of, for example, a password.
Each agent has a unique identifier, and the agent certificates and keys have the corresponding identifier.
-If required, you can use the same certificate for all agents rather than one certificate per agent. For more information, see [Generate common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair).
+If required, you can use the same certificate for all agents rather than one certificate per agent. For more information, see [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair).
-For more information on using the SSL certificates to connect in Postgres, see [Securing TCP/IP connections with SSL](https://www.postgresql.org/docs/current/ssl-tcp.html).
+For more information on using the SSL certificates to connect in Postgres, see [Securing TCP/IP connections with SSL](https://www.postgresql.org/docs/current/ssl-tcp.html) in the Postgres documentation.
### PEM certificate renewal
-SSL certificates have an expiry date. If you are using certificates and keys generated by PEM, they are automatically replaced before expiring.
+SSL certificates have an expiry date. If you're using certificates and keys generated by PEM, PEM replaces them before they expire.
The PEM agent installed with the PEM server monitors the expiration date of the `ca_certificate.crt` file. When the certificate is about to expire, PEM:
-- Makes a backup of the existing certificate files
-- Creates new certificate files and appends the new CA certificate file to the `root.crt` file on the PEM server
-- Creates a job to renew the certificate file for any active agents
-- Restarts the PEM server
+- Makes a backup of the existing certificate files.
+- Creates new certificate files and appends the new CA certificate file to the `root.crt` file on the PEM server.
+- Creates a job to renew the certificate file for any active agents.
+- Restarts the PEM server.
!!! Important
-If you choose to either provide your own certificates, or use a single certificate for all agents, you should disable the automatic renewal job.
+If you choose to provide your own certificates or use a single certificate for all agents, disable the automatic renewal job.
On the PEM server, execute the following SQL:
```sql
@@ -136,7 +136,7 @@ WHERE jobname = 'Check CA certificate expiry';
```
!!!
-If you need to regenerate the server or agent certificates manually, please see:
+If you need to regenerate the server or agent certificates manually, see:
- [Regenerating the server SSL certificates](replacing_ssl_certificates)
- [Regenerating agent SSL certificates](regenerating_agent_certificates)
@@ -146,7 +146,7 @@ By creating and using a single Postgres user for all PEM agents rather than one
Create a user, generate an agent certificate and key pair, and use them for all PEM agents.
-1. Create one common agent user in the PEM backend database. Grant the `pem_agent` role to the user.
+1. Create one common agent user in the PEM backend database. Grant the pem_agent role to the user.
```shell
# Running as enterprisedb
@@ -176,7 +176,7 @@ Create a user, generate an agent certificate and key pair, and use them for all
openssl x509 -req -days 365 -in agent.csr -CA ca_certificate.crt -CAkey ca_key.key -CAcreateserial -out agent.crt
```
-1. Change the permissions on the `agent.crt` and `agent.key` file:
+1. Change the permissions on the `agent.crt` and `agent.key` files:
```shell
chmod 600 agent.crt agent.key
@@ -209,7 +209,7 @@ Create a user, generate an agent certificate and key pair, and use them for all
- To replace the agent certificate and key pair with the registered agent.
- a. Edit the `agent_user`, `agent_ssl_key`, and `agent_ssl_crt` parameters in `agent.cfg` file of the agent host:
+ a. Edit the `agent_user`, `agent_ssl_key`, and `agent_ssl_crt` parameters in the `agent.cfg` file of the agent host:
```shell
vi /usr/edb/pem/agent/etc/agent.cfg
@@ -262,7 +262,7 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl
1. Ask your CA to sign the CSR and generate the server certificate for you.
-1. Verify the details of the new server certificate aren't tampered with and match your provided details:
+1. Verify that the details of the new server certificate aren't tampered with and match your provided details:
```shell
openssl x509 -noout -text -in server.crt
@@ -277,8 +277,8 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl
1. If the trusted CA doesn't provide CRL, disable CRL usage by the server. To disable the CRL usage, comment the `ssl_crl_file` parameter in the `postgresql.conf` file.
!!! Note
- If you accidentally leave a CRL from a previous CA in place and do not comment out `ssl_crl_file`, the server will start but authentication will fail with an SSL error message `tlsv1 alert unknown ca`.
- The error doesn't specify that the CRL is the cause, so this can be difficult to debug if encountered out of context.
+ If you leave a CRL from a previous CA in place and don't comment out `ssl_crl_file`, the server will start. However, authentication will fail with an SSL error message: `tlsv1 alert unknown ca`.
+ The error doesn't specify that the CRL is the cause, so this issue can be difficult to debug if encountered out of context.
1. Copy the new `root.crt`, `server.key`, and `server.crt` files to the data directory of the backend database server:
@@ -286,7 +286,7 @@ After obtaining the trusted CA certificates and keys, replace the [server](#repl
cp root.crt server.key server.crt /var/lib/edb/as/data
```
-1. Change the owner and permissions of the new certificates and key files to be the same as the data directory:
+1. Change the owner and permissions of the new certificates and key files to the same name as the data directory:
```shell
cd /var/lib/edb/as/data/
@@ -369,7 +369,7 @@ Replace the agent SSL certificates only after replacing the server certificates
Use the Services applet to restart the PEM agent. The PEM agent service is named Postgres Enterprise Manager Agent. Select the service name in the Services dialog box, and select **Restart the service**.
!!! Note
-For agents registered after following the process above you can provide a certificate to the agent at the time of registration as shown in the [second example](/pem/latest/registering_agent/#overriding-default-configurations---examples).
+For agents registered after following the preceding process, you can provide a certificate to the agent at the time of registration as shown in the [second example](/pem/latest/registering_agent/#overriding-default-configurations---examples).
!!!
!!!note
@@ -393,7 +393,7 @@ This command returns `agent1.crt: OK` on success or an explanatory message on fa
### Make a test connection to the PEM backend database
-To verify whether the agent user can connect using a certificate, on the server where the agent is located, execute the following commands as root:
+To verify whether the agent user can connect using a certificate, as root on the server where the agent is located, execute:
```shell
PGHOST=
@@ -407,6 +407,7 @@ export PGHOST PGPORT PGUSER PGSSLCERT PGSSLKEY PGSSLMODE
-A -t -c "SELECT version()"
```
+
Where:
- `` is the full path to the psql executable, for example `/usr/edb/as15/bin/psql`.
- `` is the hostname or IP address of PEM server.
@@ -414,7 +415,7 @@ Where:
- `` is the ID of the agent you're testing, as defined in the file `/usr/edb/pem/agent/etc/agent.cfg`.
!!! Note
-If you used the instructions in [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair)
+If you used the instructions in [Generate a common agent certificate and key pair](#generate-a-common-agent-certificate-and-key-pair),
you must set `PGUSER` to the common agent username.
!!!
From bd8e5fa3cc5f203eb44a9f698c4e81643e7f93eb Mon Sep 17 00:00:00 2001
From: Betsy Gitelman
Date: Fri, 25 Apr 2025 16:17:31 -0400
Subject: [PATCH 02/42] First pass at second batch
---
.../regenerating_agent_certificates.mdx | 10 ++---
.../replacing_ssl_certificates.mdx | 7 ++-
.../docs/pem/10/changing_default_port.mdx | 2 +-
.../configuring_2fa_authentication.mdx | 4 +-
..._server_to_use_kerberos_authentication.mdx | 20 ++++-----
..._server_to_use_windows_kerberos_server.mdx | 14 +++---
.../authentication_options/index.mdx | 2 +-
.../10/considerations/pem_pgbouncer/index.mdx | 4 +-
.../preparing_the_pem_database_server.mdx | 14 +++---
.../apache_httpd_security_configuration.mdx | 44 +++++++++----------
10 files changed, 60 insertions(+), 61 deletions(-)
diff --git a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
index aa6bf4c33a8..2dba19c4882 100644
--- a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
+++ b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
@@ -6,13 +6,13 @@ redirects:
---
!!! Important
-These steps are automatically performed by default when the certificates are nearing expiry.
-These instructions are provided for completeness incase you need to manually regenerate the PEM certificates and keys.
+PEM performs these steps by default when the certificates are nearing expiry.
+These instructions are provided for completeness in case you need to manually regenerate the PEM certificates and keys.
!!!
You need to regenerate the agent certificates and key files:
-- If the PEM server certificates are regenerated
-- If the PEM agent certificates are near expiring
+- If the PEM server certificates are regenerated.
+- If the PEM agent certificates are near expiring.
You must regenerate a certificate and a key for each agent interacting with the PEM server and copy it to the agent.
@@ -66,7 +66,7 @@ To generate a PEM agent certificate and key file pair:
Where `-req` indicates the input is a CSR. The `-CA` and `-CAkey` options specify the root certificate and private key to use for signing the CSR.
- Before generating the next certificate and key file pair, move the `agent.key` and `agent.crt` files generated in the steps 2 and 4 on their respective PEM agent host.
+ Before generating the next certificate and key file pair, move the `agent.key` and `agent.crt` files generated in steps 2 and 4 on their respective PEM agent host.
6. Change the permission on the new `agent.crt` and `agent.key` file:
diff --git a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
index 13fc858973f..af690688362 100644
--- a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
+++ b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
@@ -8,8 +8,8 @@ redirects:
If the PEM backend database server certificates are near expiring, plan to regenerate the certificates and key files.
!!! Important
-By default, these steps are performed automatically when the certificates are nearing expiry.
-These instructions are provided for completeness if incase you need to manually regenerate the PEM certificates and keys.
+PEM performs these steps by default when the certificates are nearing expiry.
+These instructions are provided for completeness in case you need to manually regenerate the PEM certificates and keys.
!!!
To replace the SSL certificates:
@@ -105,7 +105,7 @@ To replace the SSL certificates:
openssl req -new -key server.key -out server.csr -subj '/C=IN/ST=MH/L=Pune/O=EDB/CN=PEM'
```
- Where `-subj` is provided as per your requirements. You define `CN` asthe hostname/domain name of the PEM server host.
+ Where `-subj` is provided as per your requirements. You define `CN` as the hostname/domain name of the PEM server host.
1. Use the `openssl x509` command to sign the CSR and generate a server certificate. Move the `server.crt` to the data directory of the backend database server:
@@ -132,4 +132,3 @@ To replace the SSL certificates:
Restarting the backend database server restarts the PEM server.
1. Regenerate each PEM agent's SSL certificates. For more information, see [Regenerating agent SSL certificates](regenerating_agent_certificates).
-
diff --git a/product_docs/docs/pem/10/changing_default_port.mdx b/product_docs/docs/pem/10/changing_default_port.mdx
index 2476c46ac43..1e291056d7e 100644
--- a/product_docs/docs/pem/10/changing_default_port.mdx
+++ b/product_docs/docs/pem/10/changing_default_port.mdx
@@ -2,7 +2,7 @@
title: "Changing the default port"
---
-By default, the 8443 port is assigned for the web services at the time of configuration of the PEM server.
+By default, the 8443 port is assigned for the web services when the PEM server is configured.
You can change the port after configuration by changing a few parameters in the web server configuration files.
The names and locations of these files are platform specific.
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
index 713c93d333f..f75bfe78ed9 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
@@ -8,7 +8,7 @@ redirects:
---
-PEM supports two methods for 2FA:
+PEM supports two methods for two-factor authentication (2FA):
- Email authentication
- Authenticator app (such as Google Authenticator)
@@ -17,7 +17,7 @@ To enable 2FA, you can copy these settings from the `config.py` file to the `con
| Parameter | Description |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| MFA_ENABLED | Set to `true` to enable the two-factor authentication. Default value is `false`. |
+| MFA_ENABLED | Set to `true` to enable two-factor authentication. Default value is `false`. |
| MFA_FORCE_REGISTRATION | Set to `true` to ask the users to register forcefully for the two-factor authentication methods at login. Default value is `false`. |
| MFA_SUPPORTED_METHODS | Set to `email` to use the email authentication method (send a one-time code by email) or `authenticator` to use the TOTP-based application authentication method. |
| MFA_EMAIL_SUBJECT | Set to the subject of the email for email authentication. Default value is ` - Verification Code`. |
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
index 8b296f9eb25..6b8b02069c6 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
@@ -17,13 +17,13 @@ For example, if the realm on Kerberos server is `edbpem.org`, then you can set t
## 1. Install Kerberos, the PEM server, and the PEM backend database
-Install Kerberos on the machine that functions as the authentication server. Install the PEM server on a separate machine. For more information, see [Installing the PEM Server](../../installing/).
+Install Kerberos on the machine that functions as the authentication server. Install the PEM server on a separate machine. For more information, see [Installing the PEM server](../../installing/).
Install the PEM backend database (Postgres/EDB Postgres Advanced Server) on the same machine as the PEM server or on a different one. For more information, see the Installation steps on [EDB Docs website](https://www.enterprisedb.com/docs).
## 2. Add principals on Kerberos server
-Add the principals for the PEM web application deployed under an Apache web server (HTTPD/Apache2) and the PEM Backend Database Server (PostgreSQL/EDB Postgres Advanced Server).
+Add the principals for the PEM web application deployed under an Apache web server (HTTPD/Apache2) and the PEM backend database server (PostgreSQL/EDB Postgres Advanced Server).
```shell
$ sudo kadmin.local -q "addprinc -randkey HTTP/"
@@ -109,7 +109,7 @@ Restart the database server to reflect the changes:
systemctl restart
```
-`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platforms.
+`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platform.
## 5. Obtain and view the initial ticket
@@ -128,7 +128,7 @@ $ klist
It displays the principal along with the Kerberos ticket.
!!! Note
- The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privilege on `pem` database.
+ The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privilege on the `pem` database.
## 6. Configure the PEM server
@@ -158,13 +158,13 @@ If the PEM server uses Kerberos authentication:
- All the authenticated user principals are appended with the realm (USERNAME@REALM) and passed as the database user name by default. To override the default, in the `config_local.py` file, add the parameter `PEM_USER_KRB_INCLUDE_REALM` and set it to `False`.
-- Restart the Apache server
+- Restart the Apache server:
```shell
sudo systemctl restart
```
-- Edit the entries at the top of `pg_hba.conf` to use the gss authentication method, and reload the database server.
+- Edit the entries at the top of `pg_hba.conf` to use the gss authentication method, and reload the database server:
```shell
host pem +pem_user /32 gss
@@ -178,7 +178,7 @@ If the PEM server uses Kerberos authentication:
`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platforms.
!!! Note
- If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, then you can specify connection type as `hostgssenc` to allow only gss-encrypted connection.
+ If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, then you can specify the connection type as `hostgssenc` to allow only gss-encrypted connection.
## 7. Browser settings
@@ -189,9 +189,9 @@ For Mozilla Firefox:
1. Open the low-level Firefox configuration page by loading the `about:config` page.
1. In the search box, enter `network.negotiate-auth.trusted-uris`.
-1. Double-click the `network.negotiate-auth.trusted-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with a comma.
+1. Double-click the `network.negotiate-auth.trusted-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with commas.
1. In the search box, enter `network.negotiate-auth.delegation-uris`.
-1. Double-click the `network.negotiate-auth.delegation-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with a comma.
+1. Double-click the `network.negotiate-auth.delegation-uris` preference and enter the hostname or the domain of the web server that's protected by Kerberos HTTP SPNEGO. Separate multiple domains and hostnames with commas.
1. Select **OK**.
For Google Chrome on Linux or MacOS:
@@ -215,4 +215,4 @@ For Google Chrome on Linux or MacOS:
`psql: GSSAPI continuation error: Unspecified GSS failure. Minor code may provide more information`
`GSSAPI continuation error: Key version is not available`
- Add encryption types to the keytab using ktutil or by recreating the Postgres keytab with all crypto systems from AD.
\ No newline at end of file
+ Add encryption types to the keytab using ktutil or by re-creating the Postgres keytab with all crypto systems from AD.
\ No newline at end of file
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
index c7b270895d4..bcdbe83efbd 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
@@ -5,7 +5,7 @@ redirects:
- /pem/latest/pem_inst_guide_linux/04_installing_postgres_enterprise_manager/07_configuring_the_pem_server_to_use_windows_kerberos_server/
---
-The Windows Active Directory domain service works with hostnames and not with IP addresses. To use single sign-on in PEM Server using Active Directory domain services, configure the following machines with hostnames using the DNS:
+The Windows Active Directory domain service works with hostnames and not with IP addresses. To use single sign-on in PEM server using Active Directory domain services, configure the following machines with hostnames using the DNS:
- Windows server (domain controller)
- PEM server (PEM web server and PEM backend database server)
@@ -33,7 +33,7 @@ Create users in Active Directory of the Windows server to map with the HTTP serv
1. Enter the user details.
-1. Give the password and make sure to clear **User must change password at next logon**. Also select **User cannot change password** and **Password never expires**.
+1. Enter the password and make sure to clear **User must change password at next logon**. Also select **User cannot change password** and **Password never expires**.
1. Review the user details.
@@ -41,7 +41,7 @@ Create users in Active Directory of the Windows server to map with the HTTP serv

-1. Create the user (for example, pemserverdb) in Active Cirectory of the Windows server to map with the Postgres service principal for the PEM backend database.
+1. Create the user (for example, pemserverdb) in Active Directory of the Windows server to map with the Postgres service principal for the PEM backend database.
## 3. Extract key tables from Active Directory
@@ -160,14 +160,14 @@ Run the PEM configure script on the PEM server to use Kerberos authentication:
$ sudo PEM_APP_HOST=pem.edbpem.internal PEM_KRB_KTNAME=/bin/configure-pem-server.sh
```
-In the `config_setup.py` file, configure `PEM_DB_HOST` and check that the value of `PEM_AUTH_METHOD` is set to `'kerberos'`.
+In the `config_setup.py` file, configure `PEM_DB_HOST` and check that the value of `PEM_AUTH_METHOD` is set to `'kerberos'`:
```shell
$ sudo vim /share/web/config_setup.py
PEM_DB_HOST=`pem.edbpem.internal`
```
-Configure `HOST` in the `.install-config` file.
+Configure `HOST` in the `.install-config` file:
```shell
$ sudo vim /share/.install-config
@@ -200,7 +200,7 @@ Edit the entries at the top in `pg_hba.conf` to use the gss authentication metho
`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on RHEL or Rocky Linux platforms.
!!! Note
- You can't specify the connection type as `hostgssenc`. Windows doesn't support gss encrypted connection.
+ You can't specify the connection type as `hostgssenc`. Windows doesn't support gss-encrypted connection.
## 7. Browser settings
@@ -236,4 +236,4 @@ For Google Chrome on Linux or MacOS:
`psql: GSSAPI continuation error: Unspecified GSS failure. Minor code may provide more information`
`GSSAPI continuation error: Key version is not available`
- Add encryption types to the keytab using ktutil or by recreating the Postgres keytab with all crypto systems from AD.
+ Add encryption types to the keytab using ktutil or by re-creating the Postgres keytab with all crypto systems from AD.
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
index f10d1793733..8249b9e09a5 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
@@ -9,7 +9,7 @@ navigation:
---
-PEM also supports Kerberos and 2FA authentication. For implementation instructions, see:
+PEM also supports Kerberos and two-factor authentication. For implementation instructions, see:
On Linux:
diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
index 1208d029130..c134f8b5543 100644
--- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
@@ -20,8 +20,8 @@ navigation:
You can use PgBouncer as a connection pooler for limiting the number of connections from the PEM agent to the Postgres Enterprise Manager (PEM) server on non-Windows machines:
- [PEM server and agent connection management mechanism](pem_server_pem_agent_connection_management_mechanism) provides an introduction of the PgBouncer-PEM infrastructure.
-- [Preparing the PEM database server](preparing_the_pem_database_server) provides information about preparing the PEM database server to be used with PgBouncer.
+- [Preparing the PEM database server](preparing_the_pem_database_server) provides information about preparing the PEM database server to use with PgBouncer.
- [Configuring PgBouncer](configuring_pgBouncer) provides detailed information about configuring PgBouncer to allow it to work with the PEM database server.
- [Configuring the PEM agent](configuring_the_pem_agent) provides detailed information about configuring a PEM agent to connect to PgBouncer.
-For detailed information about using the PEM web interface, see the [Accessing the web interface](../../pem_web_interface).
\ No newline at end of file
+For detailed information about using the PEM web interface, see [Accessing the web interface](../../pem_web_interface).
\ No newline at end of file
diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx
index d5ccc809c33..4fa7b618ebe 100644
--- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/preparing_the_pem_database_server.mdx
@@ -26,7 +26,7 @@ This example shows how to prepare the PEM database server with the enterprisedb
## Creating users and roles for PgBouncer-PEM connections
-1. Create a dedicated user named pgbouncer with `pem_agent_pool` membership. This user will serve connections from PgBouncer to the PEM database by forwarding all agent database queries.
+1. Create a dedicated user named pgbouncer with pem_agent_pool membership. This user will serve connections from PgBouncer to the PEM database by forwarding all agent database queries.
```sql
CREATE ROLE pgbouncer PASSWORD 'ANY_PASSWORD' LOGIN;
@@ -84,7 +84,7 @@ This example shows how to prepare the PEM database server with the enterprisedb
GRANT
```
-1. Use the `pem.create_proxy_agent_user(varchar)` function to create a user named pem_agent_user1. This proxy user will serve connections between all Agents and PgBouncer.
+1. Use the `pem.create_proxy_agent_user(varchar)` function to create a user named pem_agent_user1. This proxy user will serve connections between all agents and PgBouncer.
```sql
SELECT pem.create_proxy_agent_user('pem_agent_user1');
@@ -98,9 +98,9 @@ This example shows how to prepare the PEM database server with the enterprisedb
## Updating the configuration files to allow PgBouncer-PEM connections
-1. Allow the pgbouncer user to connect to the `pem` database using the SSL authentication method by adding the `hostssl pem` entry in the `pg_hba.conf` file of the PEM database server.
+1. Allow the pgbouncer user to connect to the `pem` database using the SSL authentication method. To do so, add the `hostssl pem` entry in the `pg_hba.conf` file of the PEM database server.
- In the list of rules, ensure you place the `hostssl pem` entry before any other rules assigned to the `+pem_agent` user.
+ In the list of rules, be sure to place the `hostssl pem` entry before any other rules assigned to the +pem_agent user.
```shell
# Allow the PEM agent proxy user (used by pgbouncer)
@@ -149,7 +149,7 @@ This example runs EDB Postgres Advanced Server on RHEL. When setting your enviro
-1. Set the `$USER_HOME` environment variable to the home directory accesible to the user:
+1. Set the `$USER_HOME` environment variable to the home directory accessible to the user:
```shell
export USER_HOME=/var/lib/edb
@@ -187,9 +187,9 @@ This example runs EDB Postgres Advanced Server on RHEL. When setting your enviro
openssl x509 -req -days 365 -in pem_agent_pool.csr -CA $DATA_DIR/ca_certificate.crt -CAkey $DATA_DIR/ca_key.key -CAcreateserial -out pem_agent_pool.crt
```
-1. Move the created key and certificate to a path the `enterprisedb` user can access.
+1. Move the created key and certificate to a path the enterprisedb user can access.
- In this example, create a folder called `~/.postgresql` in the home directory of the `enterprisedb` user and ensure it has permissions:
+ In this example, create a folder `~/.postgresql` in the home directory of the enterprisedb user and ensure it has permissions:
```
mkdir -p $USER_HOME/.postgresql
diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
index 4d2c4f81ce6..2dcde8c6c57 100644
--- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
@@ -8,19 +8,19 @@ redirects:
- /pem/latest/installing_pem_server/pem_security_best_practices/apache_httpd_security_configuration/
---
-This page details how to secure the PEM web server.
+Configure the security of the PEM web server.
On Windows, the supported web server is Apache HTTPD. Apache HTTPD is bundled with PEM under the name PEM HTTPD.
-The Apache HTTPD configuration file is `pem.conf` and the SSL configuration file is `httpd-ssl-pem.conf`. Both configuration files are in the `/conf/addons` directory.
+The Apache HTTPD configuration file is `pem.conf`, and the SSL configuration file is `httpd-ssl-pem.conf`. Both configuration files are in the `/conf/addons` directory.
On Linux, both NGINX and Apache HTTPD are supported.
The NGINX configuration file is `/etc/nginx/conf.d/edb-pem.conf` on RHEL-like systems and `/etc/nginx/sites-available/edb-pem.conf` on Debian-like systems.
-the Apache HTTPD configuration file is `edb-pem.conf` and the SSL configuration file is `edb-ssl-pem.conf`. Both configurations files are in the `/conf.d` directory.
+The Apache HTTPD configuration file is `edb-pem.conf`, and the SSL configuration file is `edb-ssl-pem.conf`. Both configurations files are in the `/conf.d` directory.
## Recommendations applied by default
These recommendations are applied by default in new installations of PEM.
-If you have customized your web server configuration, or carried it over from a much older version of PEM, you can use this information to verify that your configuration meets current standards.
+If you customized your web server configuration or carried it over from a much older version of PEM, you can use this information to verify that your configuration meets current standards.
### Disable insecure SSL and TLS protocols
@@ -42,8 +42,8 @@ SSLProtocol -All TLSv1.2
SSLProxyProtocol -All TLSv1.2
```
-You can verify that TLS 1.1 is disabled using the following command, replacing the URL with that of your web server.
-A return value of 35 means TLS 1.1 is disabled whereas 0 means it is enabled.
+You can verify that TLS 1.1 is disabled using the following command. Replace the URL with your web server's.
+A return value of 35 means TLS 1.1 is disabled. 0 means it's enabled.
```shell
curl -k -v -s --tls-max 1.1 https://pem-server:8443 >/dev/null 2>&1; echo $?
@@ -51,7 +51,7 @@ curl -k -v -s --tls-max 1.1 https://pem-server:8443 >/dev/null 2>&1; echo $?
### Disable web server information exposure
-In new installations of PEM, the web server is configured to minimize the information about the server exposed to clients by disabling server tokens (which expose information about the server in response headers) and server signatures (which expose information in the footers server-generated pages such as error messages).
+In new installations of PEM, the web server is configured to minimize the information about the server exposed to clients by disabling server tokens and server signatures. Server tokens expose information about the server in response headers. Server signatures expose information in the footers of server-generated pages such as error messages.
For NGINX, PEM adds the following line to the configuration file:
@@ -86,7 +86,7 @@ For Apache HTTPD, PEM sets setting the `Options -Indexes` directive:
The TRACE and TRACK HTTP methods are used for debugging servers. When an HTTP TRACE request is sent to a supported web server, the server responds and echoes the data passed to it, including any HTTP headers. We recommend that you disable these methods in the Apache configuration.
In NGINX, TRACK and TRACE methods are disabled by default. In Apache HTTPD, PEM includes the following lines in the configuration file to reject these methods.
-Note that some scanners do not understand this syntax, so may incorrectly report that these methods are allowed.
+Some scanners don't understand this syntax and may incorrectly report that these methods are allowed.
```shell
RewriteEngine on
@@ -94,8 +94,8 @@ RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK|OPTIONS)
RewriteRule .\* - [F]
```
-You can verify that TRACK and TRACE are disabled with the following commands replacing the URL with that of your web server.
-A return value of 35 means TLS 1.1 is disabled whereas 0 means it is enabled. If the methods are disabled, the command will return an HTML response including the text `405 Method Not Allowed` or similar.
+You can verify that TRACK and TRACE are disabled with the following commands. Replacing the URL with your web server's.
+A return value of 35 means TLS 1.1 is disabled. 0 means it's enabled. If the methods are disabled, the command returns an HTML response that includes the text `405 Method Not Allowed` or similar.
```shell
curl -kL -X TRACK https://pem-server:8443/pem
@@ -108,14 +108,14 @@ PEM sets various HTTP header options to improve security.
These settings are defined in the `config.py` and and `config_distro.py` files.
This file is located at `/usr/edb/pem/web` on Linux and at `C:\ProgramFiles\edb\pem\server\share\web` on Windows.
-If you wish to alter any of these settings, you should not edit these files, but instead create (or edit if it already exists) a file named `config_local.py` in the same location and add your desired settings.
-These settings will override those in the `config.py` and and `config_distro.py` files and will not be overwritten during a PEM upgrade.
+If you want to alter any of these settings, don't edit these files. Instead create (or edit if it already exists) a file named `config_local.py` in the same location and add your desired settings.
+These settings override those in the `config.py` and `config_distro.py` files. They aren't overwritten during a PEM upgrade.
For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/).
#### X-Frame-Options
-X-Frame-Options indicate whether a browser is allowed to render a page in an <iframe> tag. It specifically protects against clickjacking. PEM has a host validation `X_FRAME_OPTIONS` option to prevent these attacks, which you can configure in the `config_local.py` file. The default is:
+X-Frame-Options indicates whether a browser is allowed to render a page in an <iframe> tag. It specifically protects against clickjacking. PEM has a host validation `X_FRAME_OPTIONS` option to prevent these attacks, which you can configure in the `config_local.py` file. The default is:
```ini
X_FRAME_OPTIONS = "SAMEORIGIN"
@@ -123,7 +123,7 @@ X_FRAME_OPTIONS = "SAMEORIGIN"
#### Content-Security-Policy
-Content-Security-Policy is part of the HTML5 standard. It provides a broader range of protection than the X-Frame-Options header, which it replaces. It is designed so that website authors can whitelist domains. The authors can load resources (like scripts, stylesheets, and fonts) from the whitelisted domains and also from domains that can embed a page.
+Content-Security-Policy is part of the HTML5 standard. It provides a broader range of protection than the X-Frame-Options header, which it replaces. It's designed so that website authors can whitelist domains. The authors can load resources (like scripts, stylesheets, and fonts) from the whitelisted domains and also from domains that can embed a page.
PEM has a host validation `CONTENT_SECURITY_POLICY` option to prevent attacks, which you can configure in the `config_local.py` file. The default is:
@@ -185,33 +185,33 @@ Cookies are small packets of data that a server sends to your browser to store c
To apply the changes, restart the web server.
- For detailed information on `config.py` file, see [Managing Configuration Settings](../../managing_configuration_settings/).
+ For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/).
## Additional recommendations that can be applied manually
-These recommendations are not applied automatically because they require additional information or action specific to the environment in which PEM is deployed.
+These recommendations aren't applied automatically because they require additional information or action specific to the environment in which PEM is deployed.
### Secure HTTPD with SSL certificates
During PEM configuration, a self-signed certificate is generated to secure traffic between the web server and clients.
-To enhance security and to prevent browser warnings that the site is not secure, we recommend that you [replace this certificate with one signed by a trusted certificate authority](../../certificates/index.mdx/#web-server-certificates).
+To enhance security and to prevent browser warnings that the site isn't secure, we recommend that you [replace this certificate with one signed by a trusted certificate authority](../../certificates/index.mdx/#web-server-certificates).
### Run the web server from a non-privileged user account
-On Linux, PEM utilizes web server packages provided by the OS. Typically, these create a service unit which runs the web server as the root user.
+On Linux, PEM uses web server packages provided by the OS. Typically, these create a service unit that runs the web server as the root user.
Running the web server as a root user can create a security issue. We recommend that you run the web server as a unique non-privileged user. Doing so helps to secure any other services running during a security breach.
!!! Note Variations in WSGI service by platform
-PEM runs as a WSGI application. On Linux, when the web server is NGINX, the WSGI application is run by a separate service, `edb-uwsgi`, which runs as the `pem` user.
-When the web server is Apache HTTPD, the WSGI application is run by a daemon process which is a child of the Apache HTTPD process. The daemon process is run as the `pem` user.
+PEM runs as a WSGI application. On Linux, when the web server is NGINX, the WSGI application is run by a separate service, `edb-uwsgi`, which runs as the pem user.
+When the web server is Apache HTTPD, the WSGI application is run by a daemon process which is a child of the Apache HTTPD process. The daemon process is run as the pem user.
-On Windows, the `WSGIDaemonProcess` directive and features aren't available so both the web server and the WSGI app run as the system user (the `LocalSystem` account).
+On Windows, the `WSGIDaemonProcess` directive and features aren't available, so both the web server and the WSGI app run as the system user (the `LocalSystem` account).
!!!
### Restrict the access to a network or IP address
-It is good practice to restrict access to the web server to the smallest set of IP addresses compatible with your business needs.
+It's a good practice to restrict access to the web server to the smallest set of IP addresses compatible with your business needs.
This is most commonly done at the network infrastructure level, for example through firewall configuration, but can also be enforced by the web server.
The PEM application configuration file (`/web/config_local.py`) supports an `ALLOWED_HOSTS` configuration parameter for this purpose.
From 9301470b798801fb4dc77620b908209569e44317 Mon Sep 17 00:00:00 2001
From: Betsy Gitelman
Date: Mon, 28 Apr 2025 14:34:34 -0400
Subject: [PATCH 03/42] Edits to PEM10 - second round
---
.../regenerating_agent_certificates.mdx | 6 +++---
.../certificates/replacing_ssl_certificates.mdx | 4 ++--
.../configuring_2fa_authentication.mdx | 2 +-
...pem_server_to_use_kerberos_authentication.mdx | 12 ++++++------
...pem_server_to_use_windows_kerberos_server.mdx | 14 +++++++-------
.../authentication_options/index.mdx | 2 +-
.../pem_pgbouncer/configuring_pgBouncer.mdx | 4 ++--
.../10/considerations/pem_pgbouncer/index.mdx | 2 +-
.../apache_httpd_security_configuration.mdx | 16 ++++++++--------
9 files changed, 31 insertions(+), 31 deletions(-)
diff --git a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
index 2dba19c4882..481229cd7e3 100644
--- a/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
+++ b/product_docs/docs/pem/10/certificates/regenerating_agent_certificates.mdx
@@ -16,7 +16,7 @@ You need to regenerate the agent certificates and key files:
You must regenerate a certificate and a key for each agent interacting with the PEM server and copy it to the agent.
-Each agent has a unique identifier that's stored in the pem.agent table of the pem database. You must replace the certificate and key files with the certificate or key files that corresponds to the agent's identifier.
+Each agent has a unique identifier that's stored in the pem.agent table of the `pem` database. You must replace the certificate and key files with the certificate or key files that correspond to the agent's identifier.
Prerequisites:
- PEM server has certificates.
@@ -66,9 +66,9 @@ To generate a PEM agent certificate and key file pair:
Where `-req` indicates the input is a CSR. The `-CA` and `-CAkey` options specify the root certificate and private key to use for signing the CSR.
- Before generating the next certificate and key file pair, move the `agent.key` and `agent.crt` files generated in steps 2 and 4 on their respective PEM agent host.
+ Before generating the next certificate and key-file pair, move the `agent.key` and `agent.crt` files generated in steps 2 and 4 on their respective PEM agent host.
-6. Change the permission on the new `agent.crt` and `agent.key` file:
+6. Change the permission on the new `agent.crt` and `agent.key` files:
```shell
chmod 600 agent.crt agent.key
diff --git a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
index af690688362..7331704b016 100644
--- a/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
+++ b/product_docs/docs/pem/10/certificates/replacing_ssl_certificates.mdx
@@ -91,7 +91,7 @@ To replace the SSL certificates:
openssl genrsa -out server.key 4096
```
-1. Move the `server.key` to the data directory of the backend server, and change the ownership and permissions:
+1. Move `server.key` to the data directory of the backend server, and change the ownership and permissions:
```shell
mv server.key /var/lib/edb/as/data
@@ -107,7 +107,7 @@ To replace the SSL certificates:
Where `-subj` is provided as per your requirements. You define `CN` as the hostname/domain name of the PEM server host.
-1. Use the `openssl x509` command to sign the CSR and generate a server certificate. Move the `server.crt` to the data directory of the backend database server:
+1. Use the `openssl x509` command to sign the CSR and generate a server certificate. Move `server.crt` to the data directory of the backend database server:
```shell
openssl x509 -req -days 365 -in server.csr -CA ca_certificate.crt -CAkey ca_key.key -CAcreateserial -out server.crt
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
index f75bfe78ed9..59d6fcbf4b2 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_2fa_authentication.mdx
@@ -28,7 +28,7 @@ To use the email authentication method, you need to configure mail server settin
PEM server can send an email using either the SMTP configurations saved in the PEM configuration or using Flask-Mail.
-To send the email verification code using the internal SMTP configuration from the PEM configuration, set the parameter `MAIL_USE_PEM_INTERNAL` to `True`. If set to `False`, the following mail configuration is used to send the code on the user-specified email address:
+To send the email verification code using the internal SMTP configuration from the PEM configuration, set the parameter `MAIL_USE_PEM_INTERNAL` to `True`. If set to `False`, the following mail configuration is used to send the code to the user-specified email address:
- MAIL_SERVER = 'localhost'
- MAIL_PORT = 25
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
index 6b8b02069c6..57ef712c60e 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_kerberos_authentication.mdx
@@ -13,13 +13,13 @@ You can configure Kerberos authentication for the PEM server. The Kerberos serve
- PEM server (PEM web server and PEM backend database server)
- Client machine
-For example, if the realm on Kerberos server is `edbpem.org`, then you can set the Kerberos server hostname to `Krb5server.edbpem.org`, the PEM server hostname to `pem.edbpem.org`, and the client's hostname to `pg12.edbpem.org`.The convention is to use the DNS domain name as the name of the realm.
+For example, if the realm on Kerberos server is `edbpem.org`, then you can set the Kerberos server hostname to `Krb5server.edbpem.org`, the PEM server hostname to `pem.edbpem.org`, and the client's hostname to `pg12.edbpem.org`. The convention is to use the DNS domain name as the name of the realm.
## 1. Install Kerberos, the PEM server, and the PEM backend database
Install Kerberos on the machine that functions as the authentication server. Install the PEM server on a separate machine. For more information, see [Installing the PEM server](../../installing/).
-Install the PEM backend database (Postgres/EDB Postgres Advanced Server) on the same machine as the PEM server or on a different one. For more information, see the Installation steps on [EDB Docs website](https://www.enterprisedb.com/docs).
+Install the PEM backend database (Postgres/EDB Postgres Advanced Server) on the same machine as the PEM server or on a different one. For more information, see the installation steps on [EDB Docs website](https://www.enterprisedb.com/docs).
## 2. Add principals on Kerberos server
@@ -125,7 +125,7 @@ $ kinit
$ klist
```
-It displays the principal along with the Kerberos ticket.
+These commands display the principal along with the Kerberos ticket.
!!! Note
The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privilege on the `pem` database.
@@ -178,12 +178,12 @@ If the PEM server uses Kerberos authentication:
`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on a `RHEL` or Rocky Linux platforms.
!!! Note
- If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, then you can specify the connection type as `hostgssenc` to allow only gss-encrypted connection.
+ If you're using PostgreSQL or EDB Postgres Advanced Server 12 or later, you can specify the connection type as `hostgssenc` to allow only gss-encrypted connections.
## 7. Browser settings
-Configure the browser on the client machine to access the PEM web client to use the Spnego/Kerberos.
+Configure the browser on the client machine to access the PEM web client to use Spnego/Kerberos.
For Mozilla Firefox:
@@ -196,7 +196,7 @@ For Mozilla Firefox:
For Google Chrome on Linux or MacOS:
-- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, run the `google-chrome` command as follows:
+- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, use this `google-chrome` command:
```ini
google-chrome --auth-server-whitelist = "hostname/domain"
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
index bcdbe83efbd..f7068da7f72 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/configuring_the_pem_server_to_use_windows_kerberos_server.mdx
@@ -41,7 +41,7 @@ Create users in Active Directory of the Windows server to map with the HTTP serv

-1. Create the user (for example, pemserverdb) in Active Directory of the Windows server to map with the Postgres service principal for the PEM backend database.
+1. Create the user (for example, pemserverdb) in Active Directory on the Windows server to map with the Postgres service principal for the PEM backend database.
## 3. Extract key tables from Active Directory
@@ -98,7 +98,7 @@ Extract the key tables for the service principals and map them with the respecti
## 4. Configure the PEM backend database server
-Add the key table location in the `postgresql.conf` file.
+Add the key table location in the `postgresql.conf` file:
```shell
krb_server_keyfile='FILE://pemdb.keytab'
@@ -147,7 +147,7 @@ $ kinit
$ klist
```
-It displays the principal along with the Kerberos ticket.
+These commands display the principal along with the Kerberos ticket.
!!! Note
The `USERNAME@REALM` specified here must be a database user having the pem_admin role and CONNECT privileges on the `pem` database.
@@ -186,7 +186,7 @@ Restart the Apache server:
sudo systemctl restart
```
-Edit the entries at the top in `pg_hba.conf` to use the gss authentication method. Then reload the database server.
+Edit the entries at the top in `pg_hba.conf` to use the gss authentication method. Then reload the database server:
```shell
host pem +pem_user /32 gss
@@ -200,11 +200,11 @@ Edit the entries at the top in `pg_hba.conf` to use the gss authentication metho
`POSTGRES_SERVICE_NAME` is the service name of the Postgres (PostgreSQL/EDB Postgres Advanced Server) database, for example, `postgresql-13` for PostgreSQL 13 database on RHEL or Rocky Linux platforms.
!!! Note
- You can't specify the connection type as `hostgssenc`. Windows doesn't support gss-encrypted connection.
+ You can't specify the connection type as `hostgssenc`. Windows doesn't support gss-encrypted connections.
## 7. Browser settings
-Configure the browser on the client machine to access the PEM web client to use the Spnego/Kerberos.
+Configure the browser on the client machine to access the PEM web client to use Spnego/Kerberos.
For Mozilla Firefox:
@@ -217,7 +217,7 @@ For Mozilla Firefox:
For Google Chrome on Linux or MacOS:
-- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, run the `google-chrome` command as follows:
+- Add the `--auth-server-whitelist` parameter to the `google-chrome` command. For example, to run Chrome from a Linux prompt, use this `google-chrome` command:
```ini
google-chrome --auth-server-whitelist = "hostname/domain"
diff --git a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
index 8249b9e09a5..7421afdb6d9 100644
--- a/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
+++ b/product_docs/docs/pem/10/considerations/authentication_options/index.mdx
@@ -9,7 +9,7 @@ navigation:
---
-PEM also supports Kerberos and two-factor authentication. For implementation instructions, see:
+PEM supports Kerberos and two-factor authentication. For implementation instructions, see:
On Linux:
diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx
index 449f0b092a1..8d1943ee306 100644
--- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/configuring_pgBouncer.mdx
@@ -98,10 +98,10 @@ If you're running community PgBouncer, replace the names of the directories, fil
```
!!!note
- For more information on `auth_user` see [Authentication settings](https://www.pgbouncer.org/config.html#authentication-settings).
+ For more information on `auth_user`, see [Authentication settings](https://www.pgbouncer.org/config.html#authentication-settings).
!!!
-1. Create an HBA file `(/etc/edb/pgbouncer<1.x>/hba_file)` for PgBouncer that contains the following content:
+1. Create an HBA file (`/etc/edb/pgbouncer<1.x>/hba_file`) for PgBouncer that contains the following content:
```ini
# Use the authentication method scram-sha-256 for local connections
diff --git a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
index c134f8b5543..b2ca47e13ef 100644
--- a/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_pgbouncer/index.mdx
@@ -19,7 +19,7 @@ navigation:
You can use PgBouncer as a connection pooler for limiting the number of connections from the PEM agent to the Postgres Enterprise Manager (PEM) server on non-Windows machines:
-- [PEM server and agent connection management mechanism](pem_server_pem_agent_connection_management_mechanism) provides an introduction of the PgBouncer-PEM infrastructure.
+- [PEM server and agent connection management mechanism](pem_server_pem_agent_connection_management_mechanism) is an introduction to the PgBouncer-PEM infrastructure.
- [Preparing the PEM database server](preparing_the_pem_database_server) provides information about preparing the PEM database server to use with PgBouncer.
- [Configuring PgBouncer](configuring_pgBouncer) provides detailed information about configuring PgBouncer to allow it to work with the PEM database server.
- [Configuring the PEM agent](configuring_the_pem_agent) provides detailed information about configuring a PEM agent to connect to PgBouncer.
diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
index 2dcde8c6c57..668084b9f9d 100644
--- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/apache_httpd_security_configuration.mdx
@@ -8,7 +8,7 @@ redirects:
- /pem/latest/installing_pem_server/pem_security_best_practices/apache_httpd_security_configuration/
---
-Configure the security of the PEM web server.
+You can configure the security of the PEM web server.
On Windows, the supported web server is Apache HTTPD. Apache HTTPD is bundled with PEM under the name PEM HTTPD.
The Apache HTTPD configuration file is `pem.conf`, and the SSL configuration file is `httpd-ssl-pem.conf`. Both configuration files are in the `/conf/addons` directory.
@@ -71,11 +71,11 @@ ServerSignature Off
The directory listing allows an attacker to view the complete contents of directories from which content is served.
This listing might lead to the attacker reverse engineering an application to obtain the source code, analyze it for possible security flaws, and discover more information about an application.
-To avoid this risk, PEM disables directory listing
+To avoid this risk, PEM disables directory listing.
For NGINX, PEM sets `autoindex: off`.
-For Apache HTTPD, PEM sets setting the `Options -Indexes` directive:
+For Apache HTTPD, PEM sets the `Options -Indexes` directive:
```shell
Options -Indexes
@@ -105,7 +105,7 @@ curl -kL -X TRACE https://pem-server:8443/pem
## Optimize HTTP headers for security
PEM sets various HTTP header options to improve security.
-These settings are defined in the `config.py` and and `config_distro.py` files.
+These settings are defined in the `config.py` and `config_distro.py` files.
This file is located at `/usr/edb/pem/web` on Linux and at `C:\ProgramFiles\edb\pem\server\share\web` on Windows.
If you want to alter any of these settings, don't edit these files. Instead create (or edit if it already exists) a file named `config_local.py` in the same location and add your desired settings.
@@ -168,7 +168,7 @@ Cookies are small packets of data that a server sends to your browser to store c
SESSION_COOKIE_SECURE = True
```
-- SESSION_COOKIE_HTTPONLY — By default, JavaScript can read the content of cookies. The `HTTPOnly` flag prevents scripts from reading the cookie. Instead, the browser uses the cookie only with HTTP or HTTPS requests. Hackers can't exploit XSS vulnerabilities to learn the contents of the cookie. For example, the `sessionId` cookie never requires that it be read with a client-side script. So, you can set the `HTTPOnly` flag for `sessionId` cookies. The default is:
+- SESSION_COOKIE_HTTPONLY — By default, JavaScript can read the content of cookies. The `HTTPOnly` flag prevents scripts from reading the cookie. Instead, the browser uses the cookie only with HTTP or HTTPS requests. Hackers can't exploit XSS vulnerabilities to learn the contents of the cookie. For example, the `sessionId` cookie never requires being read with a client-side script. So, you can set the `HTTPOnly` flag for `sessionId` cookies. The default is:
```ini
SESSION_COOKIE_HTTPONLY = True
@@ -181,7 +181,7 @@ Cookies are small packets of data that a server sends to your browser to store c
```
!!! Note
- This option can cause problems when the server deploys in dynamic IP address hosting environments, such as Kubernetes or behind load balancers. In such cases, set this option to `False`.
+ This option can cause problems when the server deploys in dynamic IP address hosting environments, such as Kubernetes or behind load balancers. In these cases, set this option to `False`.
To apply the changes, restart the web server.
@@ -204,7 +204,7 @@ Running the web server as a root user can create a security issue. We recommend
!!! Note Variations in WSGI service by platform
PEM runs as a WSGI application. On Linux, when the web server is NGINX, the WSGI application is run by a separate service, `edb-uwsgi`, which runs as the pem user.
-When the web server is Apache HTTPD, the WSGI application is run by a daemon process which is a child of the Apache HTTPD process. The daemon process is run as the pem user.
+When the web server is Apache HTTPD, the WSGI application is run by a daemon process that's a child of the Apache HTTPD process. The daemon process is run as the pem user.
On Windows, the `WSGIDaemonProcess` directive and features aren't available, so both the web server and the WSGI app run as the system user (the `LocalSystem` account).
!!!
@@ -212,7 +212,7 @@ On Windows, the `WSGIDaemonProcess` directive and features aren't available, so
### Restrict the access to a network or IP address
It's a good practice to restrict access to the web server to the smallest set of IP addresses compatible with your business needs.
-This is most commonly done at the network infrastructure level, for example through firewall configuration, but can also be enforced by the web server.
+This is most commonly done at the network infrastructure level, for example, through firewall configuration, but can also be enforced by the web server.
The PEM application configuration file (`/web/config_local.py`) supports an `ALLOWED_HOSTS` configuration parameter for this purpose.
For example:
From ffa64a192d402a2a2075041a92dbedbcdccd46d9 Mon Sep 17 00:00:00 2001
From: Lawrence Man <60245737+LawrenceMan@users.noreply.github.com>
Date: Fri, 9 May 2025 11:37:56 +0800
Subject: [PATCH 04/42] Update 01_cluster_properties.mdx
Align with the values specified in efm.properties.in
---
.../docs/efm/5/04_configuring_efm/01_cluster_properties.mdx | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx b/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx
index 77270721298..e6bd4e446ca 100644
--- a/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx
+++ b/product_docs/docs/efm/5/04_configuring_efm/01_cluster_properties.mdx
@@ -142,8 +142,8 @@ Use the properties in the `efm.properties` file to specify connection, administr
| [log.dir](#log_dir) | | | | If not specified, defaults to '/var/log/efm-<version>' |
| [syslog.host](#syslog_logging) | | | localhost | |
| [syslog.port](#syslog_logging) | | | 514 | |
-| [syslog.protocol](#syslog_logging) | | | | |
-| [syslog.facility](#syslog_logging) | | | UDP | |
+| [syslog.protocol](#syslog_logging) | | | UDP | |
+| [syslog.facility](#syslog_logging) | | | LOCAL1 | |
| [file.log.enabled](#logtype_enabled) | Y | Y | true | |
| [syslog.enabled](#logtype_enabled) | Y | Y | false | |
| [jgroups.loglevel](#loglevel) | | | info | |
From f08293fdc2579529049a8080b74ce8cc7d59cb9c Mon Sep 17 00:00:00 2001
From: Lawrence Man <60245737+LawrenceMan@users.noreply.github.com>
Date: Sun, 11 May 2025 14:24:05 +0800
Subject: [PATCH 05/42] Update index.mdx
Modify version number.
---
product_docs/docs/efm/5/efm_quick_start/index.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/product_docs/docs/efm/5/efm_quick_start/index.mdx b/product_docs/docs/efm/5/efm_quick_start/index.mdx
index 6002efe4fb7..95206902d7d 100644
--- a/product_docs/docs/efm/5/efm_quick_start/index.mdx
+++ b/product_docs/docs/efm/5/efm_quick_start/index.mdx
@@ -21,7 +21,7 @@ Using EDB Postgres Advanced Server as an example (Failover Manager also works wi
- Install Failover Manager on each primary and standby node. During EDB Postgres Advanced Server installation, you configured an EDB repository on each database host. You can use the EDB repository and the `yum install` command to install Failover Manager on each node of the cluster:
```shell
- yum install edb-efm49
+ yum install edb-efm50
```
During the installation process, the installer creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres. The example that follows creates a cluster named `efm`.
From 75af114fd99986ea940902c57d141ece1d9d2562 Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Wed, 14 May 2025 17:53:18 +0100
Subject: [PATCH 06/42] Release Notes for 4.1.0 stubbed
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++
.../ai-accelerator/rel_notes/index.mdx | 2 ++
.../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++
3 files changed, 42 insertions(+)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
new file mode 100644
index 00000000000..cad6f23262f
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -0,0 +1,23 @@
+---
+title: AI Accelerator - Pipelines 4.1.0 release notes
+navTitle: Version 4.1.0
+originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+editTarget: originalFilePath
+---
+
+Released: 19 May 2025
+
+This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+
+## Highlights
+
+- MOAR AI
+
+## Enhancements
+
+Description | Addresses |
+Placeholder for future release note.
Soon.
+ | |
+
+
+
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
index dbc87bf6dfd..a46870bd873 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
@@ -4,6 +4,7 @@ navTitle: Release notes
description: Release notes for EDB Postgres AI - AI Accelerator
indexCards: none
navigation:
+ - ai-accelerator_4.1.0_rel_notes
- ai-accelerator_4.0.1_rel_notes
- ai-accelerator_4.0.0_rel_notes
- ai-accelerator_3.0.1_rel_notes
@@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera
| AI Accelerator version | Release Date |
|---|---|
+| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 |
| [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 |
| [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 |
| [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 |
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
new file mode 100644
index 00000000000..d7c8eebe66a
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json
+product: AI Accelerator - Pipelines
+version: 4.1.0
+date: 19 May 2025
+intro: |
+ This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+highlights: |
+ - MOAR AI
+relnotes:
+- relnote: Placeholder for future release note.
+ details: |
+ Soon.
+ jira: ""
+ addresses: ""
+ type: Enhancement
+ impact: Medium
+
From abed1b5edbd8a5a19d53d83956a301190dfd6b36 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:09:55 -0700
Subject: [PATCH 07/42] fix typo
---
.../ai-accelerator/preparers/examples/chunk_text.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
index aa4340663cb..906f59d5d99 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
@@ -3,7 +3,7 @@ title: Preparers chunk text operation examples
navTitle: Chunk text
description: Examples of using preparers with the ChunkText operation in AI Accelerator.
---
-These dxamples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
+These examples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
## Primitive
From 0340e4a3a05e81ca259d03e1dfa2ae3b33f10438 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:10:13 -0700
Subject: [PATCH 08/42] update primitive examples with output
---
.../ai-accelerator/preparers/primitives.mdx | 71 +++++++++++++++++--
1 file changed, 65 insertions(+), 6 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
index 2d0e2d556b5..a661a32883b 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
@@ -17,20 +17,25 @@ All data preparation operations can be customized with different options. The AP
Call `aidb.chunk_text()` to break text into smaller chunks.
```sql
-SELECT
- chunk_id,
- chunk
-FROM aidb.chunk_text(
+SELECT * FROM aidb.chunk_text(
input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.',
options => '{"desired_length": 120, "max_length": 150}'
);
+
+__OUTPUT__
+ part_id | chunk
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
+ 0 | This is a significantly longer text example that might require splitting into smaller chunks.
+ 1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
+ 2 | This enables processing or storage of data in manageable parts.
+(3 rows)
```
- The `desired_length` size is the target size for the chunk. In most cases, this value also serves as the maximum size of the chunk. It's possible for a chunk to be returned that's less than the `desired` value, as adding the next piece of text may have made it larger than the `desired` capacity.
- The `max_length` size is the maximum possible chunk size that can be generated. Setting this to a value larger than `desired` means that the chunk should be as close to `desired` as possible but can be larger if it means staying at a larger semantic level.
-!!! Note
-This primitive function returns each chunk with a `chunk_id` for ease of development. However, a preparer with the `ChunkText` operation outputs a single text array per input that can then be unnested as desired.
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
!!!
## Parse HTML
@@ -55,6 +60,22 @@ SELECT * FROM aidb.parse_html(
',
options => '{"method": "StructuredPlaintext"}' -- Default
);
+
+__OUTPUT__
+ parse_html
+-----------------------------------------------------------
+ Hello, world! +
+ +
+ This is my first web page. +
+ +
+ It contains some bold text, some italic test, and a link.+
+ +
+ Postgres Logo Image +
+ List item +
+ List item +
+ List item +
+
+(1 row)
```
- The `method` determines how the HTML is parsed:
@@ -70,12 +91,24 @@ SELECT * FROM aidb.parse_pdf(
bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'),
options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default
);
+
+__OUTPUT__
+ part_id | text
+---------+--------------
+ 0 | Hello World!+
+ |
+(1 row)
```
- The `method` determines how the PDF is parsed:
- `Structured` (Default) — Algorithmic text extraction.
- The `allow_partial_parsing` flag determines whether to continue to parse PDFs when the parser encounters errors on one or more pages. Defaults to `true`.
+
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
+!!!
+
## Summarize text
Call `aidb.summarize_text()` to summarize text:
@@ -88,6 +121,17 @@ SELECT * FROM aidb.summarize_text(
input => 'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.',
options => '{"model": "my_t5_model"}'
);
+
+__OUTPUT__
+ create_model
+--------------
+ my_t5_model
+(1 row)
+
+ summarize_text
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim .
+(1 row)
```
- The `model` is the name of the created model to use for summarization. The model must support the `decode_text()` and `decode_text_batch()` [model primitives](../models/primitives).
@@ -108,11 +152,26 @@ SELECT * FROM aidb.perform_ocr(
decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'),
options => '{"model": "my_paddle_ocr_model"}'
);
+
+__OUTPUT__
+ create_model
+--------------
+my_paddle_ocr_model
+(1 row)
+
+ part_id | text
+---------+------------------
+ 0 | Tesseract sample
+(1 row)
```
- The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation.
!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
+!!!
+
+!!! Note
Limitations of the model still apply. For example, the [NVIDIA NIM Image OCR API](https://docs.nvidia.com/nim/ingestion/table-extraction/latest/api-reference.html) model provider only supports `png` and `jpeg` image inputs.
!!!
From 9bd74cc87a180eb10bed25c6285d6348b8c9ce2b Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:11:31 -0700
Subject: [PATCH 09/42] fix typo
---
.../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
index c148f6c5463..24e20f2c607 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
@@ -10,7 +10,7 @@ description: "Usage of preparers in AI Accelerator Pipelines."
The source data preparer can come from a Postgres table or a PGFS volume. Given the different nature of the data sources and the options required for each, you use different functions to create them.
!!! Note
-You can customze te behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives).
+You can customize the behavior of the data preparation operation for the preparer with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the `aidb.chunk_text()` primitive for use with a scalable preparer that performs the `ChunkText` operation. Learn more in [Primitives](./primitives).
!!!
## Preparer for a table data source
From 2f4d31b9b21d965bb490b80754c821f4205ada5b Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:13:02 -0700
Subject: [PATCH 10/42] update preparer usage with better column name for
unnested chunk
---
.../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
index 24e20f2c607..43213770488 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
@@ -41,7 +41,7 @@ SELECT aidb.create_table_preparer(
source_table => 'test_source_table',
source_data_column => 'content',
destination_table => 'chunked_data_destination_table',
- destination_data_column => 'chunks',
+ destination_data_column => 'chunk',
source_key_column => 'id',
destination_key_column => 'id',
options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation
@@ -73,7 +73,7 @@ SELECT aidb.create_volume_preparer(
operation => 'ChunkText',
source_volume_name => 'test_volume',
destination_table => 'chunked_data_destination_table',
- destination_data_column => 'chunks',
+ destination_data_column => 'chunk',
destination_key_column => 'id',
options => '{"desired_length": 100}'::JSONB -- Configuration for the ChunkText operation
);
@@ -108,7 +108,7 @@ SELECT * FROM aidb.preparers;
__OUTPUT__
id | name | operation | destination_schema | destination_table | destination_key_column | destination_data_column | options | source_type | source_schema | source_table | source_data_column | source_key_column | source_volume_name
----+---------------+-----------+--------------------+--------------------------------+------------------------+-------------------------+-------------------------+-------------+---------------+-------------------+--------------------+-------------------+--------------------
- 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunks | {"desired_length": 100} | Table | public | test_source_table | content | id |
+ 1 | test_preparer | ChunkText | public | chunked_data_destination_table | id | chunk | {"desired_length": 100} | Table | public | test_source_table | content | id |
(1 row)
```
From 23feea1870eeee2e6d04267ad82f07735b0f6c85 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:21:33 -0700
Subject: [PATCH 11/42] add output to chunk auto processing ex
---
.../examples/chunk_text_auto_processing.mdx | 48 ++++++++++++++++++-
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
index 629bd2f3daf..1991b802cba 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
@@ -4,7 +4,11 @@ navTitle: Auto Processing
description: Examples of using the preparer auto processing in AI Accelerator.
---
-Examples of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
+Example of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
+
+!!! Note
+Many of the small confirmation output notices have been removed for brevity.
+!!!
## Preparer with table data source
@@ -22,7 +26,7 @@ SELECT aidb.create_table_preparer(
source_table => 'source_table__1628',
source_data_column => 'content',
destination_table => 'chunked_data__1628',
- destination_data_column => 'chunks',
+ destination_data_column => 'chunk',
source_key_column => 'id',
destination_key_column => 'id',
options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation
@@ -32,14 +36,54 @@ SELECT aidb.set_auto_preparer('preparer__1628', 'Live');
INSERT INTO source_table__1628
VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');
+```
+
+```sql
SELECT * FROM chunked_data__1628;
+__OUTPUT__
+ id | part_id | unique_id | chunk
+----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
+ 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks.
+ 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
+ 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts.
+(3 rows)
+```
+
+```sql
INSERT INTO source_table__1628
VALUES (2, 'This sentence should be its own chunk. This too.');
+```
+
+```sql
SELECT * FROM chunked_data__1628;
+__OUTPUT__
+ id | part_id | unique_id | chunk
+----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
+ 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks.
+ 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
+ 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts.
+ 2 | 0 | 2.part.0 | This sentence should be its own chunk.
+ 2 | 1 | 2.part.1 | This too.
+(5 rows)
+```
+
+```sql
DELETE FROM source_table__1628 WHERE id = 1;
+```
+
+```sql
SELECT * FROM chunked_data__1628;
+__OUTPUT__
+ id | part_id | unique_id | chunk
+----+---------+-----------+----------------------------------------
+ 2 | 0 | 2.part.0 | This sentence should be its own chunk.
+ 2 | 1 | 2.part.1 | This too.
+(2 rows)
+```
+
+```sql
SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled');
```
From 8439f24eba101f4bbd7687adacd96fd0e725d90c Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:31:23 -0700
Subject: [PATCH 12/42] add output to chunk text ex
---
.../preparers/examples/chunk_text.mdx | 68 +++++++++++++++----
1 file changed, 55 insertions(+), 13 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
index 906f59d5d99..8d92bb4d5d7 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
@@ -11,20 +11,61 @@ These examples use preparers with the [ChunkText operation](../primitives#chunk-
-- Only specify a desired length
SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10}');
+__OUTPUT__
+ part_id | chunk
+---------+-----------
+ 0 | This is a
+ 1 | simple
+ 2 | test
+ 3 | sentence.
+(4 rows)
+```
+
+```sql
-- Specify a desired length and a maximum length
SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10, "max_length": 15}');
+__OUTPUT__
+ part_id | chunk
+---------+-------------
+ 0 | This is a
+ 1 | simple test
+ 2 | sentence.
+(3 rows)
+```
+
+```sql
-- Named parameters
-SELECT
- chunk_id,
- chunk
-FROM aidb.chunk_text(
+SELECT * FROM aidb.chunk_text(
input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.',
- options => '{"desired_length": 10}'
+ options => '{"desired_length": 40}'
);
+__OUTPUT__
+ part_id | chunk
+---------+----------------------------------------
+ 0 | This is a significantly longer text
+ 1 | example that might require splitting
+ 2 | into smaller chunks.
+ 3 | The purpose of this function is to
+ 4 | partition text data into segments of a
+ 5 | specified maximum length, for example,
+ 6 | this sentence 145 is characters.
+ 7 | This enables processing or storage of
+ 8 | data in manageable parts.
+(9 rows)
+```
+
+```sql
-- Semantic chunking to split into the largest continuous semantic chunk that fits in the max_length
SELECT * FROM aidb.chunk_text('This sentence should be its own chunk. This too.', '{"desired_length": 1, "max_length": 1000}');
+
+__OUTPUT__
+ part_id | chunk
+---------+----------------------------------------
+ 0 | This sentence should be its own chunk.
+ 1 | This too.
+(2 rows)
```
## Preparer with table data source
@@ -56,12 +97,13 @@ SELECT aidb.bulk_data_preparation('preparer__1628');
SELECT * FROM chunked_data__1628;
--- Unnest chunk text arrays
-SELECT
- id,
- chunk_number,
- chunk
-FROM
- chunked_data__1628,
- unnest(chunks) WITH ORDINALITY AS chunk_list(chunk, chunk_number);
+__OUTPUT__
+ id | part_id | unique_id | chunks
+----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
+ 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks.
+ 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
+ 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts.
+ 2 | 0 | 2.part.0 | This sentence should be its own chunk.
+ 2 | 1 | 2.part.1 | This too.
+(5 rows)
```
From f008c1b5b510d2a4dec1f7ab0a32e4f43d082538 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:33:45 -0700
Subject: [PATCH 13/42] add output to parse html ex
---
.../preparers/examples/parse_html.mdx | 56 +++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx
index bf611ad5515..19e82bd8e6d 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_html.mdx
@@ -14,6 +14,17 @@ SELECT * FROM aidb.parse_html(
'Hello World Heading
Hello World paragraph
'
);
+__OUTPUT__
+ parse_html
+-----------------------
+ Hello World Heading +
+ +
+ Hello World paragraph+
+
+(1 row)
+```
+
+```sql
-- Parse Hello World HTML to plaintext
SELECT * FROM aidb.parse_html(
html =>
@@ -33,6 +44,24 @@ SELECT * FROM aidb.parse_html(
options => '{"method": "StructuredPlaintext"}' -- Default
);
+__OUTPUT__
+ parse_html
+-----------------------------------------------------------
+ Hello, world! +
+ +
+ This is my first web page. +
+ +
+ It contains some bold text, some italic test, and a link.+
+ +
+ Postgres Logo Image +
+ List item +
+ List item +
+ List item +
+
+(1 row)
+```
+
+```sql
-- Parse Hello World HTML to markdown-esque text that retains some syntactical context
SELECT * FROM aidb.parse_html(
html =>
@@ -51,6 +80,22 @@ SELECT * FROM aidb.parse_html(
',
options => '{"method": "StructuredMarkdown"}'
);
+
+__OUTPUT__
+ parse_html
+---------------------------------------------------------------------------------------
+ # Hello, world! +
+ +
+ This is my first web page. +
+ +
+ It contains some **bold text**, some *italic test*, and a [link](https://google.com).+
+ +
+  +
+ 1. List item +
+ 2. List item +
+ 3. List item +
+
+(1 row)
```
## Preparer with table data source
@@ -81,4 +126,15 @@ SELECT aidb.create_table_preparer(
SELECT aidb.bulk_data_preparation('preparer__2772');
SELECT * FROM destination_table__2772;
+
+__OUTPUT__
+ id | parsed_html
+----+-------------------------------------------------------
+ 1 | Hello World Heading +
+ | +
+ | Hello World paragraph +
+ |
+ 2 | This is some bold text, some italic test, and a link.+
+ |
+(2 rows)
```
From 57a6d91c0c49373118d7966471fe70b9f530e509 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:34:38 -0700
Subject: [PATCH 14/42] add output to parse pdf ex
---
.../preparers/examples/parse_pdf.mdx | 32 ++++++++++++++-----
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
index 8f2f3f99328..0c393de126a 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
@@ -15,11 +15,27 @@ SELECT * FROM aidb.parse_pdf(
decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex')
);
+__OUTPUT__
+ part_id | text
+---------+--------------
+ 0 | Hello World!+
+ |
+(1 row)
+```
+
+```sql
-- Manually specify the default options
SELECT * FROM aidb.parse_pdf(
bytes => decode('255044462d312e340a25b89a929d0a312030206f626a3c3c2f547970652f436174616c6f672f50616765732033203020523e3e0a656e646f626a0a322030206f626a3c3c2f50726f64756365722847656d426f782047656d426f782e50646620312e37202831372e302e33352e313034323b202e4e4554204672616d65776f726b29292f4372656174696f6e4461746528443a32303231313032383135313732312b303227303027293e3e0a656e646f626a0a332030206f626a3c3c2f547970652f50616765732f4b6964735b34203020525d2f436f756e7420312f4d65646961426f785b302030203539352e3332203834312e39325d3e3e0a656e646f626a0a342030206f626a3c3c2f547970652f506167652f506172656e742033203020522f5265736f75726365733c3c2f466f6e743c3c2f46302036203020523e3e3e3e2f436f6e74656e74732035203020523e3e0a656e646f626a0a352030206f626a3c3c2f4c656e6774682035393e3e73747265616d0a42540a2f46302031322054660a3120302030203120313030203730322e3733363636363720546d0a2848656c6c6f20576f726c642129546a0a45540a656e6473747265616d0a656e646f626a0a362030206f626a3c3c2f547970652f466f6e742f537562747970652f54797065312f42617365466f6e742f48656c7665746963612f4669727374436861722033322f4c61737443686172203131342f5769647468732037203020522f466f6e7444657363726970746f722038203020523e3e0a656e646f626a0a372030206f626a5b3237382032373820302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203020302030203732322030203020302030203020302030203020302030203020302030203020393434203020302030203020302030203020302030203020302030203535362035353620302030203020302030203020323232203020302035353620302030203333335d0a656e646f626a0a382030206f626a3c3c2f547970652f466f6e7444657363726970746f722f466c6167732033322f466f6e744e616d652f48656c7665746963612f466f6e7446616d696c792848656c766574696361292f466f6e74576569676874203530302f4974616c6963416e676c6520302f466f6e7442426f785b2d313636202d3232352031303030203933315d2f436170486569676874203731382f58486569676874203532332f417363656e74203731382f44657363656e74202d3230372f5374656d482037362f5374656d562038383e3e0a656e646f626a0a787265660a3020390a303030303030303030302036353533352066200a30303030303030303135203030303030206e200a30303030303030303539203030303030206e200a30303030303030313739203030303030206e200a30303030303030323537203030303030206e200a30303030303030333436203030303030206e200a30303030303030343531203030303030206e200a30303030303030353733203030303030206e200a30303030303030373733203030303030206e200a747261696c65720a3c3c2f526f6f742031203020522f49445b3c39333932413539463342453742383430383035443632373436453841344632393e3c39333932413539463342453742383430383035443632373436453841344632393e5d2f496e666f2032203020522f53697a6520393e3e0a7374617274787265660a3938380a2525454f460a', 'hex'),
options => '{"method": "Structured", "allow_partial_parsing": true}' -- Default
);
+
+__OUTPUT__
+ part_id | text
+---------+--------------
+ 0 | Hello World!+
+ |
+(1 row)
```
## Preparer with table data source
@@ -51,12 +67,12 @@ SELECT aidb.bulk_data_preparation('preparer__6124');
SELECT * FROM destination_table__6124;
--- Unnest chunk text arrays
-SELECT
- id,
- page_number,
- parsed_text
-FROM
- destination_table__6124,
- unnest(parsed_pdf) WITH ORDINALITY AS pdf_pages(parsed_text, page_number);
+__OUTPUT__
+ id | part_id | unique_id | parsed_pdf
+----+---------+-----------+--------------
+ 1 | 0 | 1.part.0 | Hello World!+
+ | | |
+ 2 | 0 | 2.part.0 | Hello World!+
+ | | |
+(2 rows)
```
From 56f1f9a766e6fe3af126f3bab6786323fb56bb02 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:36:09 -0700
Subject: [PATCH 15/42] add output to ocr ex
---
.../preparers/examples/perform_ocr.mdx | 46 +++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
index 0985759af7c..9f2fbcd5159 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
@@ -28,11 +28,25 @@ SELECT * FROM aidb.perform_ocr(
options => '{"model": "my_paddle_ocr_model"}'
);
+__OUTPUT__
+ part_id | text
+---------+------------------
+ 0 | Tesseract sample
+(1 row)
+```
+
+```sql
-- Positional arguments
SELECT * FROM aidb.perform_ocr(
decode('/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAUFBQUFBQUGBgUICAcICAsKCQkKCxEMDQwNDBEaEBMQEBMQGhcbFhUWGxcpIBwcICkvJyUnLzkzMzlHREddXX0BBQUFBQUFBQYGBQgIBwgICwoJCQoLEQwNDA0MERoQExAQExAaFxsWFRYbFykgHBwgKS8nJScvOTMzOUdER11dff/CABEIAWYDKgMBIgACEQEDEQH/xAAvAAEAAQUBAQEAAAAAAAAAAAAAAQIDBAcIBgkFAQEAAAAAAAAAAAAAAAAAAAAA/9oADAMBAAIQAxAAAADsuKLhCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRCRE2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAeS/INiNdjYj8z9MAAAAAAAAAAFJUiQAAAAAAAAA8l60AAAAAAAAAAAAAtXbV0pqpqAAAAAAAAAAESNE617AwD5kdJap+gp5L2PC/459AJ5c/ROko4DxD6EU+I4rPoY+enVJuJzfpI79ngXrU2LT88PfnaEcVfhHek8K9jno6fnrmH0A5c/S5YO89icId3FSOODsiOC847meN4qPoPPzx6iN0zw9hHeD5yeqO8Y+cfeZ6ev5uepO+nzz71OD/AKDfPnsc2BHz7xj6HPBe9AAAAAAAAAAALV21dKaqagAAAAAAAAAABgZ+AcB9U8rddHGO7vC9XHz43/me6NMbI3JxwdYcV9C7HNYbf17pY1f0lo3uI4n3V+3+oaE745D6qNBYHPvdp89uldE78PWb15730as1P+h+cbw2VrDZw4F7757Nd+52Vwsd76d8b6I8H6LZ2sjWvf8AwN9Bjh7qvlbrY0P7bxfuDkP6Q/M/6UnPuDbzzRP6f5npD9GxuT9Y557i576EAAAAAAAAAAALV21dKaqagAAAAAAAAAAB+f8AoDkPrmscZeZ7zHN3ot3jgnI7uHhuRe8xwb0NuwaL5774HEXUXuxzt0HeHCE93DgLq7Z44Rx+9hqWxuEcQ9vRJHMHUA4O/Y7ZGpOWfoCOJMTuYci9cVDmPf8A+6NIei2bQfMjZn5vf5wr2z+kOReivYDgivvMeI9wAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAiRh5gAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABau2rpTVTUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWrtq6U1U1AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFq7aulNVu2ZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHGQxxkMcZDHFd2xfIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIrD//EADIQAAAFBAEEAQMDAgcBAAAAAAABAwQFAgYHERQIEDI0YBIhQBYxNxUXExgiMzZBUTX/2gAIAQEAAQgAM/2IaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMaMGejIjCn+9R8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl8kr80+1HsJfJK/NPtR7CXySvzT7Uewl+PWJq+bPtt3Q0mv7s40H92caA8uY2IvtEzEXOsUZCK/FMGYI/w6vsQjr6tGWlFIljTv/v8qvzT7Uewl+PV+32yLgyLyLO0y7qS6V4JjHP3dOLLBaZCuv8Aobk+k6AFiWehYtuM4Fv/AKgRgxVVsEezBjf3BmewRgz0NmCP7gwdX/h1aLY2exv/AMI/uDFVWwZiqrY6nZyZhIi0zjMSvXL7G9ouXdPYx9QIwZ6BfuDq+w+obH1D6h9Q+ob0Nj6hsGMOfzu9HiPqBH+TX5p9qPYS/I0LiPcFOjplL68nBdZJuiqsreWX76v+4lLesZ/jvPVutq5YsIZrf3O8/TNyZ3y5I2OTSFhY7Hud7uaJTJxuRsoYkuNCPuS5L4joCyFrtDGYzJmSQd1xU3B5xxeknLr4ayVXkS3FFHubsyubKqRgYJpYGe7sbUzNdr5ZyDjC4k4a8MgTSieOLgmIiAzpdUPa0qzPEUNll3e8RO3FlDMt0zN0r2jZR4kzyuib6vFOWrzgL0ZWrdFz3JHWlBSM3Jr35ljME8sxgZayc8WEzrmDw9lx9kSMmIaRyVCZPim8VXe2LbfzKudmvWlAqPQzNma40blVs+1KMUZ7WQ/qJ42zFeFrXY3te8r7vBnY1tSE46j5TNmY3rpeMuC182Y1bFMr4OyO8yFbjs5LM+QbptfKRos6Es5Zg/x5dkyvnJ2ILppYzNUnmjNDp28g5STy/h+XZFI2fciV22nDzidWYb7gLunq6GdjZ5vZrTMqxeSsoYluGiPuGCmWdww7CUZYc/nZ6MjX0yx/bjiVcsl83ZiXcPWD93mrD7xs6kMb36wyBbbeUQ/Hr80+1HsJfk3Ee4KdHTL/ACcM9SriIxjOKIYeynb+M05ZV5/mxt0K3SweZQZXFCdR2PZ2adR91RNn9TEtAs20XcJ3zhLLTqPSnb+sei4McvrYhrMv28sLSD+NcodRWPLqj14q48cwOP4ti5fWYtdbOjKbm5Zz/NfbtJERZgytb2TG0WbS2ZNaT6ZJMluma02U7c0tLvf+vs6h8K4muKidcSnVLaTYq6Y5tcLy6svQU296q5RZC2bYjKOnSAaxWN457QommqnWnXaOFrQsqfonYnqxMv6PZowyZHiqySMVHovvmmxrqtW/nV3xtu9VWkk0bijHuD8qz9Ekd9Wnbl4QpsrijMq4cxdG1QUHkbqGQu+3ZSChukv1r2GfUKHeYTQrYMGsYyaMWvVikmU9ai9ONo5tH2LaTZr1VoUfo6AVPAf8VQQx5ANbjzXWzdU0kRERdS1uspGw/wCrH0uy67yy5Rirhn+eXY6pZZda7ouMFsdSNqWxARcO1vDqLtS7rblIRx0pyaqc9cUaZfj1+afaj2EvybiPcFOjpl/k4ZphlZvGlxtkenJGyJn+sQk//bTH5Ee17xxknfqNqxV75zj7Euv9OO5CybEu9qi5eZ1xRbthpxsnCWdlF3a+E4K4ZOyrstTMMCu8Xv7AtgPoeUkGHTBKvm15ykRSszibQzc5bXCjjrHLpFNZvk55ivGqEcS8k6jpDAs6/jekoqDZ3mZz7xWMgZyQRxHbLXKGQXn6lj7CseCR+prPS8ZOZ7YvYvqohl3Vq29LJ9N11sZexW8IcjJM4pg6fvsU5ou/IN81xVfVgW4izRhqojxdZZkW9isN89wru917NfzuK8f3GVZvst2UzxjeDJCFz7dkypj3HiAwhiOyntmxdwymYCtG0Mdz6KfSWW0L1pLOP80tu3Vl/wDYs4WP/wAMtQdVf/CoMYC/iWCFm3OhaGYlJVw2dJOUUXDbqZu1kytJK3qOmaHVj7BcvVMM/wA8ux1Vw6qM/b8xTj23cYXhacRJNrqtbGFpwT+ZfYVua1LudTDqF/Hr80+1HsJfkyyCrqKlWyWEcSXpZV8HLzCqdKidSdd/dO00hLqzVkHavUi/QOLWxDg+myXZTk7l7EDXIyDZ2zY2P1D2jQbGJa4PypfEmg7vFTH1uHZX6NN1g7KlkSqjm0Xlk9RN2pkxlsQ4haY4aLuXWXcNMsip0SLJlY3UNahVMIm2sAXpc03RLX5eltqSNgTdvw+AMe3LYTe4051dJJwiskrceAr+tafVkrO/QfUJeRExmrs6e7ytqSil7XtO3Zaaxq2hL9mMBZFs2YqkbMe48z/exJs5vFeLo/GsWskWZscOskW22aMLKxbmi25q30zpFX7fbL+B1btk1bhtttb3UrFJUMG1qdPd2zs4lL33lXGTfIFsNY1tGWDn+zf8VhCRuA7/ALwVXf3vi3HWXLNvOPoLKOI7zufJSE9GfcZ/xjdl+SNuLwdrMXMbbsAwc56sefvm2othCYpt2UtOw4yHlLbs6i/MmzkFWlYnUJZlNcbCW50/XxdMyUnesZGs4dg0jmOOMS3lbmVHFxSF62bE31AuYiTVwxmCxpBeu1VMVZvv1y3TuawLHi7AgUYlh+PX5p9qPYS/K120Q0Q0Q0Q0NENENENDRDQ0NENF30Q0NENENENDRdtF+3fQ0Q0WxoaIaGiGiGiGi7q/7dQwwnWWd3lRl+/30Q0NENECIiGi/Jr80+1HsJfAzLYoYR6KhrIfn1+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2o9hL5JX5p9qPYS+SV+afaj2Evklfmn2WqIlaa6uUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgOUgProVqoOkVUFUDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEDbUEKaaaf2H//xABDEAABAwEFBQcBBAYIBwAAAAABAAIDBAUREpKhEBNhYrEGFDEyQVFgQCEiU1QVQlBSgpMjM0NVcoGywgcgMDSRs9H/2gAIAQEACT8A8SjojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojojovXZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fUdo6SinczGI5XLttZmddtrMzrttZv8xV8dXRy3hs0f2tJb4/sztFS1FoNLw6njde8GPz5frObZx6fUW9U0jxAyLdRxtcu1lc90FPJKGmBgBLGkq0JaSLu80u9jYHOvjXa+v/AJDFWPqooHvcJXtDXHEb/Af8/h9Bac9IZqucSmJ5aTc1TvmqJaO+SR5vc44vpPzVp9T9ZzbOPT6n0s2quyFfkKlSBkcbS9zifsAb9pKE0NKHmNhg/rZgPsLy70arQrH7sbx4gqzM9v8ACnD9JNYe61HlMnuxw/fVzbSqot6+o/CjVrVMYnG8jE1YYXEH1whTTz0wcHS01QcYfG7xdG5f0sHdWTQR+rzJ5Qq+aOmidc5scu4gi9mK16s0sbxfLHUGoY3/ABgpgZaVE4RVIHg/1EgCwPteaO+SXzCBjv8AeVaVVHvRvGCerMDsoXeZ6IOAmhqPtexhPnjcq4tcbO31NUR+mLwIKtGeutmsqg2nmm+/uIsIBLPckplqGzXMmc99Q8hpxs/cUksbY5jSmSEXzVEwNxwezFa8u9wYt0a8h6nmqYZ6plC5lR54Ji7A1SYaejjxkDxe7wDBxJKkngg+1zaWmO6jiZ7yPVqVckEH35TBVGcsHuWlBjLfpKKSWGSMYWzM8gdweHFVU8sT5ZW0ollD7n/rqtqWdm+9QPwGYBndhL94Ydr3wSQvZDPNH9sss0g8katWYTFuLdOryHqWaeB84pH94/raaRxuDuLUwybkBsUXrJJIbmtVdNHSxuIO7k7vTx4v1VbNX3eJwxTQ1RnDPbECmN/SNnSMimkHhIJAS1ytipZQQRU0rqWN9zHKokorMc4iCPfd3i/gVXUyNYQZ6SeTeMmid6scU6WhsdkhbEGy7iLN4uerVqRj+8zFNv4Jg0+5UeA1dNjey+/C5t4IVsz1TI6mtghglOONpc4sblVqVMTJhjiE1UYC4cGhSTzwNI3lNUnGHx+8b1JipquASMP+L0dxC/M2n1KaJJvJTQeBkkKrJ4qMO/Vl7rTt4MKral1KXAXul7xTycpKYI52nBVQfhyj6jm2cen1PpZtVdkK/I1CfhfUGOnBBuID1YE9bXVTgGzxva3BEB5V2Rr/AOexUT7PgmtOCUQuIJbicA//ACKpH1MDKQR1LGeZmFWJ3tsDBGJYzuprmqIxVsbcEIqgY/MfKHBNZHuoIu4sLvu/0P2gf5hWOdzLLfUUtQwt++BdiY5WTVUsFVHupmkCVjghA6mrLt5JE8uxYfAG9UcldTxWnNO6naQC7CTgAv8AQFdkK7+exdn56OvpHkb+V7XYoj4tTi99LR1MN/sxktzQoWyiyoY9yx/4s5P3tloystgGR4iMxneDN5jgVi11W7wYTdEFSimnrO0FBIYgCAwb5gATiGVtdLLIPfu7P/siiHebUlmqJ3n2a8xsGUJrXteC1zXC8EO9CpKwTtbIwMfJezDIF+bqV+S/3O20sktHNVx1kFREMe5mFxucuzpMgAa6ekIGZpTGPtt+7O7nvhe8xqR7KCGQVDnMl3IaYwftJVXU1UMc75SIgZcTj7vK7PTMiqmFk1RMQ4hi/fouj15JY6RmYqNsUFPEyKJjfAMaLgmAPkpKkHMFC1jTZ0Eh9LzI0PJTBjbaXVhXtP1TA+FlqVtQ5h9dy8vQAA+wAeiiAqrOqY8D/dkieXsoa26L+ML83aX+op5EFLRh4Z7ueux1YI6WBkZLZWAOcux9ZdUxEMe6RhDH+j1J/RTUrJAznH1HNs49PqfSzaq7IV+RqFFjljiE7GjzExlWXST15eJqV84F7m+BYF2Ss4Aeu7X/AA6pKyYVrIGVceHDiVgTSsDID3gPAbgmCsChqmTRh7JWxgEh3qCFVPjZVzPYaR5xFuEediop7Q3VUaIgPuOEOIa4kqw6fFE8xTU1QGyPCphZFRTQSTB8Trove5zSpXvo5qGSVzB5Q+FwAeqBktlstWZsscovaYpvK7Vdl7NkikaHMexgIcHeoK7EUFbV1TzdTsAa5sbfFxVgCyKers2SdlI32c8XPX4lOm4paShnnY26+90bCQP/ACFVvmbuZKyYF9zp34gMK7OWdTxxC8vMTSAPcl6LDRO7RWZFCWC5pEL2RphLKGufHNwE7fE5FOzv1kySMfET950T3l7XBVLIaanidJLI43BrGi8lUVKyy2RVNRKQw42RDyakL83Uoj/sbjndtsGSnAr5qE1M8jTGHMcWAldmaXfP/tYRunDiCFakr2vibVRfiwEOICkkjFtUMVTW85ETDhVBFaNbWY33Sm9kVzixWdRUs9ZSvpaaNkTQ8umC/Eouj17UOz8rVdWL+6qP/wBTV/eY/wBDl+9OjdTC1ayKd3tHM8sJUzJYZWB0b2G8ODvUFTtdXV08b3RjzMijUZYa6te9vM2ML83aX+oqAiKamfC9/GNdm7Nkl3EbKhpYC4StFxxhdlrOENMwuwlgBf7MHErsPFY/dY2MdUAg4yf1PqObZx6fUgGSekmjYD4XuaQFSQRUndZo8TJg43yJoc1zS1zT4EO9FUhoMpl7sX7p8TuQqrtEU7xc8PnAap2VNrf2IZ5YL+r1MyltimZcyU+SSP2ep6zu7XXNEE4cxWk6KIOF8k8u8kw+zGhUx/Rgg3QH6wP4l6tEzRknBLTS7p/8TSqmsFK/z94nDI1OKi16pobLMBcxjP3GKZlLbEEeASHySNHgx6qa1tMz7g7vUB0StAiEPBkY6XeTTAfbhVOwSSUHd6WLytGG64KmijdVPhMOB+NMDo5GOY5p9Q4XEKd08AlL6Z8MuCeIFWnVQUZuEneZw1t38CZJaAZHC8zMIY+OpYg6prqqCdlaHuDnXOlJj/zDVXmojBO4mil3c4BVVUmmxjGKmoAYFM2ptKrINVVe4Hgxns0KoZFX0VRv4d55X/dIc0lVZhsmntGCSohZV3s3QlBftnigtGQX1MDzgErx9gcFU2juQLmXTtcq4iLGHyxbzezy+zT7KRlLV0Lg+iP6jcIuwkexCNZDTueQe7TAxK3XtkER3Mb5N7IXdGBU09JZTq2H9IvZIN1NDG9UkL6AClveZA3yFFUsMrKWCdk2OTBcZCCgN9S0FNDIAbwHxsAOoUDJJ4a4SyBz8ADcJCiZHVwiTG1hxAXkkKpMBmntN7JPaSMuIVfVyUYJDO6zAsVaYYXODpy9+9nkChbFTU8QjiY30DVSRMs581a8PbMC66YnCmnC770Uo80UjfBwVdJNCfLLSS4C4c7SrRlZTA+ermvDOIaEMb/PUTnzSv8AqObZx6ftQf8AX9nJhwie1P2BzbOPT4JQwRyG++RsbQ84uLR+wObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZx6fJObZ4BPOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKkOUqQ5SpDlKN4F/p8mGz//xAAUEQEAAAAAAAAAAAAAAAAAAACg/9oACAECAQE/ABK//8QAFBEBAAAAAAAAAAAAAAAAAAAAoP/aAAgBAwEBPwASv//Z', 'base64'),
'{"model": "my_paddle_ocr_model"}'
);
+
+__OUTPUT__
+ part_id | text
+---------+------------------
+ 0 | Tesseract sample
+(1 row)
```
## Preparer with table data source
@@ -62,6 +76,38 @@ SELECT aidb.create_table_preparer(
SELECT aidb.bulk_data_preparation('preparer__1527');
SELECT * FROM ocr_data__1527;
+
+__OUTPUT__
+ id | part_id | unique_id | parsed__text
+----+---------+-----------+--------------------------------------------
+ 1 | 0 | 1.part.0 | Trunch Parish Council
+ 1 | 1 | 1.part.1 | BANK RECONCILIATION AS AT 31STOCTOBER 2019
+ 1 | 2 | 1.part.2 | Account:
+ 1 | 3 | 1.part.3 | 14,389.43
+ 1 | 4 | 1.part.4 | BANK STATEMENT BALANCE 3OTH SEPTEMBER 2019
+ 1 | 5 | 1.part.5 | 83.60
+ 1 | 6 | 1.part.6 | PREVIOUS OUTSTANDING CHEQUES
+ 1 | 7 | 1.part.7 | 14,305.83
+ 1 | 8 | 1.part.8 | CASHBOOK BALANCE 31ST OCTOBER 2019
+ 1 | 9 | 1.part.9 | ADD CHEQUES OUTSTANDING:
+ 1 | 10 | 1.part.10 | *
+ 1 | 11 | 1.part.11 | 101719
+ 1 | 12 | 1.part.12 | 83.60*
+ 1 | 13 | 1.part.13 | *
+ 1 | 14 | 1.part.14 | *
+ 1 | 15 | 1.part.15 | 83.60
+ 1 | 16 | 1.part.16 | OUTSTANDING CHEQUES
+ 1 | 17 | 1.part.17 | 9,148.00
+ 1 | 18 | 1.part.18 | RECEIPTS
+ 1 | 19 | 1.part.19 | 4,309.94
+ 1 | 20 | 1.part.20 | PAYMENTS
+ 1 | 21 | 1.part.21 | 19,227.49
+ 1 | 22 | 1.part.22 | BALANCE 31STOCTOBER2019
+ 1 | 23 | 1.part.23 | 19,227.49*
+ 1 | 24 | 1.part.24 | BALANCE AS PER BANK STATEMENT
+ 1 | 25 | 1.part.25 | 0.00
+ 1 | 26 | 1.part.26 | DIFFERENCE
+(27 rows)
```
## Model compatibility
From 22f9cf9fc826c366f38cadd696123d00cdf475da Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:38:11 -0700
Subject: [PATCH 16/42] add output to summarize ex
---
.../preparers/examples/summarize_text.mdx | 25 +++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx
index 55e662801b3..c8039ebbe45 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/summarize_text.mdx
@@ -24,11 +24,25 @@ SELECT * FROM aidb.summarize_text(
options => '{"model": "model__1952"}'
);
+__OUTPUT__
+ summarize_text
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise .
+(1 row)
+```
+
+```sql
-- Positional arguments
SELECT * FROM aidb.summarize_text(
'There are times when the night sky glows with bands of color. The bands may begin as cloud shapes and then spread into a great arc across the entire sky. They may fall in folds like a curtain drawn across the heavens. The lights usually grow brighter, then suddenly dim. During this time the sky glows with pale yellow, pink, green, violet, blue, and red. These lights are called the Aurora Borealis. Some people call them the Northern Lights. Scientists have been watching them for hundreds of years. They are not quite sure what causes them. In ancient times Long Beach City College WRSC Page 2 of 2 people were afraid of the Lights. They imagined that they saw fiery dragons in the sky. Some even concluded that the heavens were on fire.',
'{"model": "model__1952"}'
);
+
+__OUTPUT__
+ summarize_text
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim .
+(1 row)
```
## Preparer with table data source
@@ -50,7 +64,7 @@ SELECT aidb.create_table_preparer(
source_table => 'source_table__1952',
source_data_column => 'content',
destination_table => 'summarized_data__1952',
- destination_data_column => 'summaries',
+ destination_data_column => 'summary',
source_key_column => 'id',
destination_key_column => 'id',
options => '{"model": "model__1952"}'::JSONB -- Configuration for the SummarizeText operation
@@ -59,6 +73,13 @@ SELECT aidb.create_table_preparer(
SELECT aidb.bulk_data_preparation('preparer__1952');
SELECT * FROM summarized_data__1952;
+
+__OUTPUT__
+ id | summary
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ 1 | the girl yelled to her Mom as she heard the clicking, scratching noises outside of the living room window . she regretted watching the horror show she had been tuned into for the last half hour . the front door blew open with a thunderous noise .
+ 2 | the night sky glows with bands of color . they may begin as cloud shapes and then spread into a great arc across the entire sky . the lights usually grow brighter, then suddenly dim .
+(2 rows)
```
## Model compatibility
@@ -88,7 +109,7 @@ SELECT aidb.create_table_preparer(
source_table => 'source_table__1952',
source_data_column => 'content',
destination_table => 'summarized_data__1952',
- destination_data_column => 'summaries',
+ destination_data_column => 'summary',
options => '{"model": "bert_model"}'::JSONB -- Incompatible model
);
__OUTPUT__
From 7fc300944e06399040087a345ff0d61a2f4803d1 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 21:57:09 -0700
Subject: [PATCH 17/42] add tips to reference new Unnesting concept
---
.../edb-postgres-ai/ai-accelerator/preparers/concepts.mdx | 6 ++++++
.../ai-accelerator/preparers/examples/chunk_text.mdx | 4 ++++
.../preparers/examples/chunk_text_auto_processing.mdx | 4 ++++
.../ai-accelerator/preparers/examples/parse_pdf.mdx | 4 ++++
.../ai-accelerator/preparers/examples/perform_ocr.mdx | 4 ++++
.../ai-accelerator/preparers/primitives.mdx | 7 ++++---
.../edb-postgres-ai/ai-accelerator/preparers/usage.mdx | 4 ++++
7 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx
index f5667266fdc..6e38969954b 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/concepts.mdx
@@ -34,6 +34,12 @@ Bulk data preparation performs a preparer's associated operation for all of the
Bulk data preparation does not delete existing destination data unless it conflicts with newly generated data. It is recommended to configure separate destination tables for each preparer.
!!!
+## Unnesting
+
+Some Preparer [Primitives](./primitives) transform the shape of the data they are given. For example, `ChunkText` receives one text block and produces one or more text blocks. Rather than return nested collections of results, these Primitives automatically unnest (or "explode") their output, using a new `part_id` column to track the additional dimension.
+
+You can see this in action in [Primitives](./primitives) and in the applicable [examples](./examples).
+
## Consistency with source data
To ensure correct and consistent data, the prepared destination data must be in sync with the source data. In the case of the table data source, you can enable preparer auto processing to inform the preparer pipeline about changes to the source data.
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
index 8d92bb4d5d7..0c617d70705 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text.mdx
@@ -5,6 +5,10 @@ description: Examples of using preparers with the ChunkText operation in AI Acce
---
These examples use preparers with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
+!!!
+
## Primitive
```sql
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
index 1991b802cba..d40f901bc12 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
@@ -10,6 +10,10 @@ Example of using preparer auto processing with the [ChunkText operation](../prim
Many of the small confirmation output notices have been removed for brevity.
!!!
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
+!!!
+
## Preparer with table data source
```sql
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
index 0c393de126a..55b3c5bf616 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/parse_pdf.mdx
@@ -6,6 +6,10 @@ description: Examples of using preparers with the ParsePdf operation in AI Accel
These examples use preparers with the [ParsePdf operation](../primitives#parse-pdf) in AI Accelerator.
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
+!!!
+
## Primitive
```sql
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
index 9f2fbcd5159..befcbb7e777 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/perform_ocr.mdx
@@ -6,6 +6,10 @@ description: Examples of using preparers with the PerformOcr operation in AI Acc
Examples of using preparers with the [PerformOcr operation](../primitives#summarize-text) in AI Accelerator.
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
+!!!
+
## Model creation (required)
This step is required for primitive single execution and for preparer bulk execution.
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
index a661a32883b..62771150bbf 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/primitives.mdx
@@ -35,7 +35,7 @@ __OUTPUT__
- The `max_length` size is the maximum possible chunk size that can be generated. Setting this to a value larger than `desired` means that the chunk should be as close to `desired` as possible but can be larger if it means staying at a larger semantic level.
!!! Tip
-This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail.
!!!
## Parse HTML
@@ -104,9 +104,10 @@ __OUTPUT__
- `Structured` (Default) — Algorithmic text extraction.
- The `allow_partial_parsing` flag determines whether to continue to parse PDFs when the parser encounters errors on one or more pages. Defaults to `true`.
+- The `part_id` column in the output references the index of the page from which the text was extracted.
!!! Tip
-This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail.
!!!
## Summarize text
@@ -168,7 +169,7 @@ my_paddle_ocr_model
- The `model` is the name of the created model to use for OCR. The model must support the `perform_ocr` operation.
!!! Tip
-This operation transforms the shape of the data, automatically unnesting collections. As a result, there may be multiple output rows for each input with a new `part_id` column to track the additional dimension.
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](./concepts#unnesting) for more detail.
!!!
!!! Note
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
index 43213770488..264a762892c 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/usage.mdx
@@ -32,6 +32,10 @@ aidb.create_table_preparer(
)
```
+!!! Tip
+The `source_key_column` must be a unique key for the source data. If the data source is the output of a Preparer that [transforms the data shape](./concepts#unnesting) with a `part_id` column, make sure to use the new `unique_id` column.
+!!!
+
### Example: Creating a preparer
``` sql
From 334ad987407ff0cef66ad1acd1b67babeadccf77 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 22:03:03 -0700
Subject: [PATCH 18/42] update source_key_column reference description to
include uniqueness recommendation
---
.../ai-accelerator/reference/knowledge_bases.mdx | 2 +-
.../edb-postgres-ai/ai-accelerator/reference/preparers.mdx | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx
index 1588f8e5126..ae5dc00975d 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/knowledge_bases.mdx
@@ -135,7 +135,7 @@ Creates a knowledge base for a given table.
| source_table | regclass | Required | Name of the table to use as source. |
| source_data_column | TEXT | Required | Column name in source table to use. |
| source_data_format | [aidb.PipelineDataFormat](#aidbpipelinedataformat) | Required | Format of data in that column ("Text", "Image", "PDF"). |
-| source_key_column | TEXT | 'id' | Column to use as key to reference the rows. |
+| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. |
| vector_table | TEXT | NULL | |
| vector_data_column | TEXT | 'embeddings' | |
| vector_key_column | TEXT | 'id' | |
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx
index 84144aad956..4835af8c5f6 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/preparers.mdx
@@ -58,7 +58,7 @@ Creates a preparer with a source data table.
| source_data_column | TEXT | Required | Column in the source table containing the raw data |
| destination_table | TEXT | Required | Name of the destination table |
| destination_data_column | TEXT | Required | Column in the destination table for processed data |
-| source_key_column | TEXT | 'id' | Column to use as key to reference the rows |
+| source_key_column | TEXT | 'id' | Unique column in the source table to use as key to reference the rows. |
| destination_key_column | TEXT | 'id' | Key column in the destination table that references the `source_key_column` |
| options | JSONB | '{}'::JSONB | Configuration options for the data preparation operation. Uses the same API as the [data preparation primitives](../preparers/primitives.mdx). |
From 7470073f01c399a06db90e0c067d28c3f3102340 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 22:32:58 -0700
Subject: [PATCH 19/42] init chained preparers ex
---
.../preparers/examples/chained_preparers.mdx | 121 ++++++++++++++++++
.../examples/chunk_text_auto_processing.mdx | 4 -
2 files changed, 121 insertions(+), 4 deletions(-)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx
new file mode 100644
index 00000000000..dab273a4fc8
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chained_preparers.mdx
@@ -0,0 +1,121 @@
+---
+title: Preparer Chaining Example
+navTitle: Preparer Chaining
+description: Examples of using the preparer auto processing in AI Accelerator.
+---
+
+Example of chaining multiple preparers together with auto processing using the [ChunkText](../primitives#chunk-text) and [SummarizeText](../primitives#summarize-text) operations in AI Accelerator.
+
+## Create the first Preparer to chunk text
+
+```sql
+-- Create source test table
+CREATE TABLE source_table__1321
+(
+ id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
+ content TEXT NOT NULL
+);
+
+SELECT aidb.create_table_preparer(
+ name => 'chunking_preparer__1321',
+ operation => 'ChunkText',
+ source_table => 'source_table__1321',
+ source_key_column => 'id',
+ source_data_column => 'content',
+ destination_table => 'chunked_data__1321',
+ destination_data_column => 'chunk',
+ destination_key_column => 'id',
+ options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation
+);
+```
+
+## Create the second Preparer to summarize the chunked text
+
+```sql
+-- Create the model. It must support the decode_text and decode_text_batch operations.
+SELECT aidb.create_model('model__1321', 't5_local');
+
+SELECT aidb.create_table_preparer(
+ name => 'summarizing_preparer__1321',
+ operation => 'SummarizeText',
+ source_table => 'chunked_data__1321', -- Reference the output from the ChunkText preparer
+ source_key_column => 'unique_id', -- Reference the unique column from the output of the ChunkText preparer
+ source_data_column => 'chunk', -- Reference the output from the ChunkText preparer
+ destination_table => 'summarized_data__1321',
+ destination_data_column => 'summary',
+ destination_key_column => 'chunk_unique_id',
+ options => '{"model": "model__1321"}'::JSONB -- Configuration for the SummarizeText operation
+);
+```
+
+!!! Tip
+This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
+!!!
+
+## Set both Preparers to Live automatic processing
+
+```sql
+SELECT aidb.set_auto_preparer('chunking_preparer__1321', 'Live');
+SELECT aidb.set_auto_preparer('summarizing_preparer__1321', 'Live');
+```
+
+## Insert data for processing
+
+Now, when we insert data into the source data table, we see processed results flowing automatically...
+
+```sql
+INSERT INTO source_table__1321
+VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');
+```
+
+Chunks calculated automatically:
+
+```sql
+SELECT * FROM chunked_data__1321;
+
+__OUTPUT__
+ id | part_id | unique_id | chunk
+----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
+ 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks.
+ 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
+ 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts.
+(3 rows)
+```
+
+Summaries of the chunks calculated automatically:
+
+```sql
+SELECT * FROM summarized_data__1321;
+
+__OUTPUT__
+ chunk_unique_id | summary
+-----------------+------------------------------------------------------------------------------------------------------
+ 1.part.0 | text example might require splitting into smaller chunks .
+ 1.part.1 | the purpose of this function is to partition text data into segments of a specified maximum length .
+ 1.part.2 | enables processing or storage of data in manageable parts .
+(3 rows)
+```
+
+The same automatic flow of logic occurs for deletions:
+
+```sql
+DELETE FROM source_table__1321 WHERE id = 1;
+```
+
+```sql
+SELECT * FROM chunked_data__1321;
+
+__OUTPUT__
+ id | part_id | unique_id | chunk
+----+---------+-----------+-------
+(0 rows)
+```
+
+```sql
+SELECT * FROM summarized_data__1321;
+
+__OUTPUT__
+ chunk_unique_id | summary
+-----------------+---------
+(0 rows)
+```
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
index d40f901bc12..c64bb9ef63e 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/preparers/examples/chunk_text_auto_processing.mdx
@@ -6,10 +6,6 @@ description: Examples of using the preparer auto processing in AI Accelerator.
Example of using preparer auto processing with the [ChunkText operation](../primitives#chunk-text) in AI Accelerator.
-!!! Note
-Many of the small confirmation output notices have been removed for brevity.
-!!!
-
!!! Tip
This operation transforms the shape of the data, automatically unnesting collections by introducing a `part_id` column. See the [unnesting concept](../concepts#unnesting) for more detail.
!!!
From d19c476e3e3b5aa3f0d997edc3e42afd3faffb91 Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 22:33:05 -0700
Subject: [PATCH 20/42] init rel notes
---
.../rel_notes/src/rel_notes_4.1.0.yml | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
index d7c8eebe66a..eadb80e9e45 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -5,13 +5,21 @@ date: 19 May 2025
intro: |
This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
highlights: |
- - MOAR AI
+ - Automatic unnesting of Preparer results for operations that transform the shape of data.
relnotes:
-- relnote: Placeholder for future release note.
+- relnote: Automatic unnesting of Preparer results for operations that transform the shape of data.
details: |
- Soon.
+ The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
+ This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
+ Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to unqiuely identify the combination of the source key and part_id.
jira: ""
addresses: ""
type: Enhancement
- impact: Medium
-
+ impact: High
+- relnote: Change output column for `chunk_text()` primitive function
+ details: |
+ The enumeration column returned by the `chunk_text()` primitive function is now `part_id` to match the other Preparer primitives/operations.
+ jira: ""
+ addresses: ""
+ type: Enhancement
+ impact: Low
From f726fe8819478ecd7419424be278093298e3afc1 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
<41898282+github-actions[bot]@users.noreply.github.com>
Date: Thu, 15 May 2025 05:35:57 +0000
Subject: [PATCH 21/42] update generated release notes
---
.../rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
index cad6f23262f..2313ee61aa2 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -11,12 +11,16 @@ This is a minor release that includes a few bug fixes and enhancements to the kn
## Highlights
-- MOAR AI
+- Automatic unnesting of Preparer results for operations that transform the shape of data.
## Enhancements
Description | Addresses |
-Placeholder for future release note.
Soon.
+Automatic unnesting of Preparer results for operations that transform the shape of data.
The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
+This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
+Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.
+ | |
+Change output column for chunk_text() primitive function
The enumeration column returned by the chunk_text() primitive function is now part_id to match the other Preparer primitives/operations.
| |
|
From 8bca94c03ef78c07c41bbac33d345e9e55cca08a Mon Sep 17 00:00:00 2001
From: Noah Baculi
Date: Wed, 14 May 2025 22:37:38 -0700
Subject: [PATCH 22/42] refine rel note
---
.../ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
index eadb80e9e45..6dc6e54c004 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -16,9 +16,10 @@ relnotes:
addresses: ""
type: Enhancement
impact: High
+
- relnote: Change output column for `chunk_text()` primitive function
details: |
- The enumeration column returned by the `chunk_text()` primitive function is now `part_id` to match the other Preparer primitives/operations.
+ The enumeration column returned by the `chunk_text()` primitive function is now `part_id` instead of `chunk_id` to match the other Preparer primitives/operations.
jira: ""
addresses: ""
type: Enhancement
From cef7ae0a43ef683dd9e06eb6d9f02fdf050035cc Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
<41898282+github-actions[bot]@users.noreply.github.com>
Date: Thu, 15 May 2025 05:39:02 +0000
Subject: [PATCH 23/42] update generated release notes
---
.../ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
index 2313ee61aa2..00f6439f16d 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -20,7 +20,7 @@ This is a minor release that includes a few bug fixes and enhancements to the kn
This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
Unnested results are returned with a new part_id
column to track the new dimension. There is also a new unique_id
column to unqiuely identify the combination of the source key and part_id.
|
-Change output column for chunk_text() primitive function
The enumeration column returned by the chunk_text() primitive function is now part_id to match the other Preparer primitives/operations.
+Change output column for chunk_text() primitive function
The enumeration column returned by the chunk_text() primitive function is now part_id instead of chunk_id to match the other Preparer primitives/operations.
| |
From a8aa0f4019d90537e2019b664be2e9f1e2b7684d Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Wed, 14 May 2025 17:53:18 +0100
Subject: [PATCH 24/42] Release Notes for 4.1.0 stubbed
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++
.../ai-accelerator/rel_notes/index.mdx | 2 ++
.../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++
3 files changed, 42 insertions(+)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
new file mode 100644
index 00000000000..cad6f23262f
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -0,0 +1,23 @@
+---
+title: AI Accelerator - Pipelines 4.1.0 release notes
+navTitle: Version 4.1.0
+originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+editTarget: originalFilePath
+---
+
+Released: 19 May 2025
+
+This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+
+## Highlights
+
+- MOAR AI
+
+## Enhancements
+
+Description | Addresses |
+Placeholder for future release note.
Soon.
+ | |
+
+
+
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
index dbc87bf6dfd..a46870bd873 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
@@ -4,6 +4,7 @@ navTitle: Release notes
description: Release notes for EDB Postgres AI - AI Accelerator
indexCards: none
navigation:
+ - ai-accelerator_4.1.0_rel_notes
- ai-accelerator_4.0.1_rel_notes
- ai-accelerator_4.0.0_rel_notes
- ai-accelerator_3.0.1_rel_notes
@@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera
| AI Accelerator version | Release Date |
|---|---|
+| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 |
| [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 |
| [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 |
| [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 |
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
new file mode 100644
index 00000000000..d7c8eebe66a
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json
+product: AI Accelerator - Pipelines
+version: 4.1.0
+date: 19 May 2025
+intro: |
+ This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+highlights: |
+ - MOAR AI
+relnotes:
+- relnote: Placeholder for future release note.
+ details: |
+ Soon.
+ jira: ""
+ addresses: ""
+ type: Enhancement
+ impact: Medium
+
From e809ecf8199c4686da33f268ba387a3f31a1bd2d Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Thu, 15 May 2025 11:26:07 +0100
Subject: [PATCH 25/42] Remove New from front page
Signed-off-by: Dj Walker-Morgan
---
src/pages/index.js | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/pages/index.js b/src/pages/index.js
index 921ba62eab1..6232e6907ab 100644
--- a/src/pages/index.js
+++ b/src/pages/index.js
@@ -282,7 +282,7 @@ const Page = () => {
Get Started with Pipelines
- New: AI Accelerator Preparers
+ AI Accelerator Preparers
PGvector
From 75e81762f2014e70cb04a15542c949e5f13f5d53 Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Wed, 14 May 2025 17:53:18 +0100
Subject: [PATCH 26/42] Release Notes for 4.1.0 stubbed
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++
.../ai-accelerator/rel_notes/index.mdx | 2 ++
.../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++
3 files changed, 42 insertions(+)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
new file mode 100644
index 00000000000..cad6f23262f
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -0,0 +1,23 @@
+---
+title: AI Accelerator - Pipelines 4.1.0 release notes
+navTitle: Version 4.1.0
+originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+editTarget: originalFilePath
+---
+
+Released: 19 May 2025
+
+This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+
+## Highlights
+
+- MOAR AI
+
+## Enhancements
+
+Description | Addresses |
+Placeholder for future release note.
Soon.
+ | |
+
+
+
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
index dbc87bf6dfd..a46870bd873 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
@@ -4,6 +4,7 @@ navTitle: Release notes
description: Release notes for EDB Postgres AI - AI Accelerator
indexCards: none
navigation:
+ - ai-accelerator_4.1.0_rel_notes
- ai-accelerator_4.0.1_rel_notes
- ai-accelerator_4.0.0_rel_notes
- ai-accelerator_3.0.1_rel_notes
@@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera
| AI Accelerator version | Release Date |
|---|---|
+| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 |
| [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 |
| [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 |
| [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 |
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
new file mode 100644
index 00000000000..d7c8eebe66a
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json
+product: AI Accelerator - Pipelines
+version: 4.1.0
+date: 19 May 2025
+intro: |
+ This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+highlights: |
+ - MOAR AI
+relnotes:
+- relnote: Placeholder for future release note.
+ details: |
+ Soon.
+ jira: ""
+ addresses: ""
+ type: Enhancement
+ impact: Medium
+
From 5159c178528a98dd05ee6c75521fa05b0edd60ba Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Thu, 15 May 2025 11:26:07 +0100
Subject: [PATCH 27/42] Remove New from front page
Signed-off-by: Dj Walker-Morgan
---
src/pages/index.js | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/pages/index.js b/src/pages/index.js
index 921ba62eab1..6232e6907ab 100644
--- a/src/pages/index.js
+++ b/src/pages/index.js
@@ -282,7 +282,7 @@ const Page = () => {
Get Started with Pipelines
- New: AI Accelerator Preparers
+ AI Accelerator Preparers
PGvector
From 3540c527582adae10202aedaa3c767f8e6e3da9e Mon Sep 17 00:00:00 2001
From: Tim Waizenegger
Date: Thu, 15 May 2025 13:12:56 +0200
Subject: [PATCH 28/42] Notes about model batch processing
---
.../models/supported-models/embeddings.mdx | 34 +++++++++++++++----
.../rel_notes/src/rel_notes_4.1.0.yml | 17 +++++++---
2 files changed, 41 insertions(+), 10 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
index 9e9e14cf5a4..3679a491a6b 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
@@ -42,7 +42,7 @@ Based on the name of the model, the model provider sets defaults accordingly:
## Creating the default with OpenAI model
```sql
-SELECT aidb.create_model('my_openai_embeddings',
+SELECT aidb.create_model('my_openai_embeddings',
'openai_embeddings',
credentials=>'{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"'::JSONB);
```
@@ -58,7 +58,7 @@ SELECT aidb.create_model(
'my_openai_model',
'openai_embeddings',
'{"model": "text-embedding-3-small"}'::JSONB,
- '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB
+ '{"api_key": "sk-abc123xyz456def789ghi012jkl345mn"}'::JSONB
);
```
@@ -69,12 +69,34 @@ Because this example is passing the configuration options and the credentials, u
The following configuration settings are available for OpenAI models:
* `model` — The OpenAI model to use.
-* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL.
- * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`.
+* `url` — The URL of the model to use. This value is optional and can be used to specify a custom model URL.
+ * If `openai_completions` (or `completions`) is the `model`, `url` defaults to `https://api.openai.com/v1/chat/completions`.
* If `nim_completions` is the `model`, `url` defaults to `https://integrate.api.nvidia.com/v1/chat/completions`.
* `max_concurrent_requests` — The maximum number of concurrent requests to make to the OpenAI model. The default is `25`.
-
-## Model credentials
+* `max_batch_size` — The maximum number of records to send to the model in a single request. The default is `50.000`.
+
+### Batch and parallel processing
+The model providers for `embeddings`, `openai_embeddings`, and `nim_embeddings` support sending batch requests as well as concurrent requests.
+The two settings `max_concurrent_requests` and `max_batch_size` control this behavior. When a model provider receives a set of records (E.g., from a knowledge base pipeline)
+ the following happens:
+* Assuming the knowledge base pipeline is configured with batch size 10.000.
+* And the model provider is configured with `max_batch_size=1000` and `max_concurrent_requests=5`.
+* Then, the provider will collect up to 1000 records and send them in a single request to the model.
+* And it will send 5 such large requests concurrently, until no more input records are left.
+* So in this example, the provider needs to send/receive 10 batches in total.
+ * After sending the first 5, it waits for the responses to return.
+ * Once a response is received, another request can be sent.
+ * This means the provider won't wait for all 5 to return before sending off the next 5. Instead, it always keeps up to 5 requests in flight.
+
+!!! Note
+The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on
+the hardware and infrastructure.
+
+We recommend testing different combinations by using a knowledge base pipeline. See our model performance tuning guide here: TODO
+!!!
+
+
+### Model credentials
The following credentials may be required by the service providing these models. Note: `api_key` and `basic_auth` are exclusive. Only one of these two options can be used.
* `api_key` — The API key to use for Bearer Token authentication. The api_key will be sent in a header field as `Authorization: Bearer `.
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
index 109b75ade5a..be37e6ba9ff 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -3,25 +3,34 @@ product: AI Accelerator - Pipelines
version: 4.1.0
date: 19 May 2025
intro: |
- This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+ This is a minor release that includes enhancements to the preparer pipeline and the model API providers.
highlights: |
- Automatic unnesting of Preparer results for operations that transform the shape of data.
+ - Batch processing for embeddings with external models.
relnotes:
- relnote: Automatic unnesting of Preparer results for operations that transform the shape of data.
details: |
The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to unqiuely identify the combination of the source key and part_id.
- jira: ""
+ jira: "AID-410"
addresses: ""
type: Enhancement
impact: High
-- relnote: Change output column for `chunk_text()` primitive function
+- relnote: Change output column for `chunk_text()` primitive function.
details: |
The enumeration column returned by the `chunk_text()` primitive function is now `part_id` instead of `chunk_id` to match the other Preparer primitives/operations.
- jira: ""
+ jira: "AID-410"
addresses: ""
type: Enhancement
impact: Low
+- relnote: Batch processing for embeddings with external models.
+ details: |
+ The external model providers `embeddings`, `openai_embeddings`, and `nim_embeddings` can now send a batch of inputs in a single request, rather than multiple concurrent requests.
+ This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled.
+ jira: "AID-419"
+ addresses: ""
+ type: Enhancement
+ impact: Medium
From 2d3d86760ce5dc01cbeb021bf1f7cf2965cf1f5c Mon Sep 17 00:00:00 2001
From: Tim Waizenegger
Date: Thu, 15 May 2025 15:46:26 +0200
Subject: [PATCH 29/42] performance tuning guide
---
.../capabilities/auto-processing.mdx | 6 +-
.../knowledge_base/performance_tuning.mdx | 117 ++++++++++++++++++
.../models/supported-models/embeddings.mdx | 3 +-
3 files changed, 124 insertions(+), 2 deletions(-)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
index 920ccae199d..d5ec8c3f827 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
@@ -123,7 +123,11 @@ As well as for existing pipelines:
- With [`aidb.set_auto_knowledge_base`](../reference/knowledge_bases#aidbset_auto_knowledge_base)
## Batch processing
-In Background and Disabled modes, (auto) processing happens in batches of configurable size. Within each batch,
+In Background and Disabled modes, (auto) processing happens in batches of configurable size. The pipeline will process all source records in batches.
+All records within each batch are processed in parallel wherever possible. This means pipeline steps like data retrieval, embeddings computation, and storing embeddings will run as parallel operations.
+E.g., when using a table as a data source, a batch of input records will be retrieved with a single query. With a volume source, concurrent requests will be used to retrieve a batch of records.
+
+Our [knowledge base pipeline performance tuning guide](knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput.
## Change detection
AIDB auto-processing is designed around change detection mechanisms for table and volume data sources. This allows it to only
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
new file mode 100644
index 00000000000..3d338d14f8e
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
@@ -0,0 +1,117 @@
+---
+title: "Pipelines knowledge base performance tuning"
+navTitle: "Performance tuning"
+deepToC: true
+description: "How to tune the performance of knowledge base pipelines."
+---
+
+
+## Background
+The performance (i.e., throughput of embeddings per second) can be optimized by changing pipeline and model settings.
+This guide explains the relevant settings and shows how to tune them.
+
+Knowledge base piplines process collections of individual records (rows in a table or objects in a volume). Rather than processing each record individually and sequentially, or processing all of them concurrently,
+AIDB offers batch processing. All the batches get processed sequentially, one after the other. Within each batch, records get processed concurrently wherever possible.
+
+- [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have
+- Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution.
+
+
+## Testing and tuning performance
+We will first set up test data and a knowledge base pipeline, then measure and tune the batch size.
+
+### 1) Create a table and insert test data
+The actual data content does not matter for this test, so we can generate data:
+```sql
+CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL);
+
+INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hello world';
+```
+
+
+### 2) Create a knowledge base pipeline
+The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use.
+```sql
+SELECT aidb.create_table_knowledge_base(
+ name => 'perf_test',
+ model_name => 'my_model', -- use the model you want to optimize for
+ source_table => 'test_data_10k',
+ source_data_column => 'msg',
+ source_data_format => 'Text',
+ auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime
+ batch_size => 100 -- this is the paramter we will tune during this test
+);
+__OUTPUT__
+INFO: using vector table: public.perf_test_vector
+NOTICE: index "vdx_perf_test_vector" does not exist, skipping
+NOTICE: auto-processing is set to "Disabled". Manually run "SELECT aidb.bulk_embedding('perf_test');" to compute embeddings.
+ create_table_knowledge_base
+-----------------------------
+ perf_test
+(1 row)
+```
+
+### 3) Run the pipeline, measure the performance
+We use `psql` in this test; the `\timing on` command is a feature in psql. If you use a different interface, check how it can display timing information.
+
+```sql
+\timing on
+__OUTPUT__
+Timing is on.
+```
+
+Now run the pipeline:
+```sql
+SELECT aidb.bulk_embedding('perf_test');
+__OUTPUT__
+INFO: perf_test: (re)setting state table to process all data...
+INFO: perf_test: Starting... Batch size 100, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 0
+INFO: perf_test: Batch iteration finished, unprocessed rows: 9900, count(source records): 10000, count(embeddings): 100
+INFO: perf_test: Batch iteration finished, unprocessed rows: 9800, count(source records): 10000, count(embeddings): 200
+...
+INFO: perf_test: Batch iteration finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
+INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
+ bulk_embedding
+----------------
+
+(1 row)
+
+Time: 207161,174 ms (03:27,161)
+```
+
+
+
+### 4) Tune the batch size
+You can use this call to adjust the batch size of the pipeline. We increase by 10x to 1000 records:
+```sql
+SELECT aidb.set_auto_knowledge_base('perf_test', 'Disabled', batch_size=>1000);
+```
+
+Run the pipeline again.
+
+!!! Note
+When using a Postgres table as the source, with auto-processing disabled, AIDB has no means to detect changes in the source data. So each bulk_embedding call has to re-process everything.
+
+This is convenient for performance testing.
+
+If you want to measure performance with a volumes source, you should delete and re-create the knowledge base between each test. AIDB is able to detect changes on volumes even with auto-procesing disabled.
+
+!!!
+```sql
+SELECT aidb.bulk_embedding('perf_test');
+__OUTPUT__
+INFO: perf_test: (re)setting state table to process all data...
+INFO: perf_test: Starting... Batch size 1000, unprocessed rows: 10000, count(source records): 10000, count(embeddings): 10000
+...
+INFO: perf_test: finished, unprocessed rows: 0, count(source records): 10000, count(embeddings): 10000
+ bulk_embedding
+----------------
+
+(1 row)
+
+Time: 154276,486 ms (02:34,276)
+```
+
+
+## Conclusion
+In this test, the pipeline took 02:34 min with batch size 1000 and 03:27 min with size 100. You can continue testing larger sizes until performance no longer improves, or even declines.
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
index 3679a491a6b..1bdd2a99a32 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/models/supported-models/embeddings.mdx
@@ -92,7 +92,8 @@ The two settings `max_concurrent_requests` and `max_batch_size` control this beh
The settings `max_concurrent_requests` and `max_batch_size` can have a significant impact on model performance. But they highly depend on
the hardware and infrastructure.
-We recommend testing different combinations by using a knowledge base pipeline. See our model performance tuning guide here: TODO
+We recommend leaving the defaults in place and [tuning the performance via the knowledge base pipeline batch size.](../../knowledge_base/performance_tuning)
+The default `max_batch_size` of 50.000 is intentionally high to allow the pipeline to control the actual size of the batches.
!!!
From 5ed14e5a90434cf2ee4517438d23838925853519 Mon Sep 17 00:00:00 2001
From: Tim Waizenegger
Date: Thu, 15 May 2025 16:28:58 +0200
Subject: [PATCH 30/42] note on index type
---
.../ai-accelerator/knowledge_base/performance_tuning.mdx | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
index 3d338d14f8e..e3c377cf8b8 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
@@ -16,6 +16,9 @@ AIDB offers batch processing. All the batches get processed sequentially, one af
- [Pipeline `batch_size`](../capabilities/auto-processing) determines how many records each batch should have
- Some model providers have configurable internal batch/parallel processing. We recommend leaving these setting at the default values and using the pipeline batch size to control execution.
+!!! Note
+vector indexing also has an impact on pipeline performance. You can disable the vector by using `index_type => 'disabled'` to exclude it from your measurements.
+!!!
## Testing and tuning performance
We will first set up test data and a knowledge base pipeline, then measure and tune the batch size.
@@ -33,11 +36,12 @@ INSERT INTO test_data_10k (id, msg) SELECT generate_series(1, 10000) AS id, 'hel
The optimal batch size may be very different for different models. Measure and tune the batch size for each different model you want to use.
```sql
SELECT aidb.create_table_knowledge_base(
- name => 'perf_test',
- model_name => 'my_model', -- use the model you want to optimize for
+ name => 'perf_test_b',
+ model_name => 'dummy', -- use the model you want to optimize for
source_table => 'test_data_10k',
source_data_column => 'msg',
source_data_format => 'Text',
+ index_type => 'disabled', -- optionally disable vector indexing to include/exclude it from the measurement
auto_processing => 'Disabled', -- we want to manually run the pipeline to measure the runtime
batch_size => 100 -- this is the paramter we will tune during this test
);
From 49150a7da22156193ce53464ae6d6bd50c0bf2b3 Mon Sep 17 00:00:00 2001
From: Tim Waizenegger
Date: Thu, 15 May 2025 16:30:50 +0200
Subject: [PATCH 31/42] note on index type
---
.../ai-accelerator/knowledge_base/performance_tuning.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
index e3c377cf8b8..bf6e8b3869d 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/knowledge_base/performance_tuning.mdx
@@ -24,7 +24,7 @@ vector indexing also has an impact on pipeline performance. You can disable the
We will first set up test data and a knowledge base pipeline, then measure and tune the batch size.
### 1) Create a table and insert test data
-The actual data content does not matter for this test, so we can generate data:
+The actual data content length has some impact on model performance. You can use longer text to test that.
```sql
CREATE TABLE test_data_10k (id INT PRIMARY KEY, msg TEXT NOT NULL);
From 47edd9f08f06642d71e8ef62e1ba3d1b9a016d66 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Artjoms=20I=C5=A1kovs?=
Date: Mon, 19 May 2025 08:50:39 +0100
Subject: [PATCH 32/42] Add documentation for Google Cloud Storage
---
.../ai-accelerator/pgfs/functions/gcs.mdx | 45 +++++++++++++++++++
.../ai-accelerator/reference/pgfs.mdx | 30 ++++++-------
2 files changed, 60 insertions(+), 15 deletions(-)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx
new file mode 100644
index 00000000000..3e5ba17f165
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/gcs.mdx
@@ -0,0 +1,45 @@
+---
+title: "Pipelines PGFS with Google Cloud Storage"
+navTitle: "Google Cloud storage"
+description: "PGFS options and credentials with Google Cloud Storage."
+---
+
+
+## Overview: Google Cloud Storage
+PGFS uses the `gs:` prefix to indicate an Google Cloud Storage bucket.
+
+The general syntax for using GCS is this:
+```sql
+select pgfs.create_storage_location(
+ 'storage_location_name',
+ 'gs://bucket_name',
+ credentials => '{}'::JSONB
+ );
+```
+
+### The `credentials` argument in JSON format offers the following settings:
+| Option | Description |
+|------------------------------------|------------------------------------------|
+| `google_application_credentials` | Path to the application credentials file |
+| `google_service_account_key_file` | Path to the service account key file |
+
+See the [Google Cloud documentation](https://cloud.google.com/iam/docs/keys-create-delete#creating) for more information on how to manage service account keys.
+
+These options can also be set up via the equivalent environment variables to facilitate authentication in managed environments such as Google Kubernetes Engine.
+
+## Example: private GCS bucket
+
+```sql
+SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images',
+ credentials => '{"google_service_account_key_file": "/var/run/gcs.json"}'
+ );
+```
+
+## Example: authentication in GKE
+
+Ensure that the `GOOGLE_APPLICATION_CREDENTIALS` or the `GOOGLE_SERVICE_ACCOUNT_KEY_FILE` environment variable
+is set on your PostgreSQL pod. Then, PGFS will automatically pick them up:
+
+```sql
+SELECT pgfs.create_storage_location('edb_ai_example_images', 'gs://my-company-ai-images');
+```
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx
index 296560f64fa..48366f40eed 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/reference/pgfs.mdx
@@ -5,7 +5,7 @@ description: "Reference documentation for EDB Postgres AI - AI Accelerator Pipel
deepToC: true
---
-This reference documentation for EDB Postgres AI - AI Accelerator Pipelines PGFS includes information on the functions and views available in the [pgfs](../pgfs) extension. These functions give aidb access to S3-compatible file systems and local file systems.
+This reference documentation for EDB Postgres AI - AI Accelerator Pipelines PGFS includes information on the functions and views available in the [pgfs](../pgfs) extension. These functions give aidb access to S3-compatible file systems, Google Cloud Storage buckets and local file systems.
## pgfs
@@ -44,13 +44,13 @@ Creates a storage location in the database.
#### Parameters
-| Parameter | Type | Default | Description |
-|---------------|-------|---------|---------------------------------------------------------|
-| `name` | text | | Name for storage location |
-| `url` | text | | URL for this storage location (prefix `s3:` or `file:`) |
-| `msl_id` | uuid | | Unused |
-| `options` | jsonb | | Options for the storage location |
-| `credentials` | jsonb | | Credentials for the storage location |
+| Parameter | Type | Default | Description |
+|---------------|-------|---------|-----------------------------------------------------------------|
+| `name` | text | | Name for storage location |
+| `url` | text | | URL for this storage location (prefix `s3:`, `gs:`, or `file:`) |
+| `msl_id` | uuid | | Unused |
+| `options` | jsonb | | Options for the storage location |
+| `credentials` | jsonb | | Credentials for the storage location |
#### Example
@@ -64,13 +64,13 @@ Creates a storage location in the database and associates it with a foreign tabl
#### Parameters
-| Parameter | Type | Default | Description |
-|-------------------------|------|---------|---------------------------------------------------------|
-| `storage_location_name` | text | | Name for storage location |
-| `url` | text | | URL for this storage location (prefix `s3:` or `file:`) |
-| `msl_id` | uuid | | Unused |
-| `options` | json | | Options for the storage location |
-| `credentials` | json | | Credentials for the storage location |
+| Parameter | Type | Default | Description |
+|-------------------------|------|---------|----------------------------------------------------------------|
+| `storage_location_name` | text | | Name for storage location |
+| `url` | text | | URL for this storage location (prefix `s3:`, `gs:` or `file:`) |
+| `msl_id` | uuid | | Unused |
+| `options` | json | | Options for the storage location |
+| `credentials` | json | | Credentials for the storage location |
#### Example
From 0b3a128e02760943368e5026acfeaa92cb9ccf86 Mon Sep 17 00:00:00 2001
From: Tim Waizenegger
Date: Mon, 19 May 2025 10:23:45 +0200
Subject: [PATCH 33/42] document PGFS non-https usage
---
.../ai-accelerator/pgfs/functions/s3.mdx | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx
index a97bd57cc69..ba30235113a 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/pgfs/functions/s3.mdx
@@ -25,6 +25,7 @@ select pgfs.create_storage_location(
| `skip_signature` | Disable HMAC authentication (set this to "true" when you're not providing access_key_id/secret_access_key in the credentials). |
| `region` | The region of the S3-compatible storage system. If the region is not specified, the client will attempt auto-discovery. |
| `endpoint` | The endpoint of the S3-compatible storage system. |
+| `allow_http` | Whether the endpoint uses plain HTTP (rather than HTTPS/TLS). Set this to `true` if your endpoint starts with `http://`. |
### The `credentials` argument in JSON format offers the following settings:
| Option | Description |
@@ -53,7 +54,7 @@ SELECT pgfs.create_storage_location('internal_ai_project', 's3://my-company-ai-i
);
```
-## Example: non-AWS S3 / S3-compatible
+## Example: non-AWS S3 / S3-compatible with HTTPS
This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3.
```sql
@@ -63,4 +64,16 @@ SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images'
);
```
+## Example: non-AWS S3 / S3-compatible with HTTP
+This is an example of using an S3-compatible system like minIO. The `endpoint` must be provided in this case; it can only be omitted when using AWS S3.
+
+In this case, the server does not use TLS encryption; so we configure a plain HTTP connection.
+
+```sql
+SELECT pgfs.create_storage_location('ai_images_local_minio', 's3://my-ai-images',
+ options => '{"endpoint": "http://minio-api.apps.local", "allow_http":"true"}',
+ credentials => '{"access_key_id": "my_username", "secret_access_key":"my_password"}'
+ );
+```
+
From 19ac85e279649466205aa71ca179cfdfd81d01c6 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
<41898282+github-actions[bot]@users.noreply.github.com>
Date: Mon, 19 May 2025 08:27:15 +0000
Subject: [PATCH 34/42] update generated release notes
---
.../rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
index 86e955b1e5a..51174a0fb6e 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -7,16 +7,24 @@ editTarget: originalFilePath
Released: 19 May 2025
-This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+This is a minor release that includes enhancements to the preparer pipeline and the model API providers.
## Highlights
- Automatic unnesting of Preparer results for operations that transform the shape of data.
+- Batch processing for embeddings with external models.
## Enhancements
Description | Addresses |
-Placeholder for future release note.
Soon.
+Automatic unnesting of Preparer results for operations that transform the shape of data.
The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
+This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
+Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.
+ | |
+Batch processing for embeddings with external models.
The external model providers embeddings , openai_embeddings , and nim_embeddings can now send a batch of inputs in a single request, rather than multiple concurrent requests.
+This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled.
+ | |
+Change output column for chunk_text() primitive function.
The enumeration column returned by the chunk_text() primitive function is now part_id instead of chunk_id to match the other Preparer primitives/operations.
| |
|
From 253ee9da8b6641d674d4ce6af2977dcaa1fa0213 Mon Sep 17 00:00:00 2001
From: Betsy Gitelman
Date: Mon, 28 Apr 2025 16:53:26 -0400
Subject: [PATCH 35/42] Edit of two docs in this group
---
.../pem_security_best_practices/index.mdx | 10 ++---
.../pem_application_configuration.mdx | 39 ++++++++++---------
2 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx
index d8235a3a292..d653625c537 100644
--- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/index.mdx
@@ -14,14 +14,14 @@ navigation:
To harden your PEM deployment against attack, consider the following measures:
-1. Ensure PEM itself, your operating system, and third party libraries are regularly updated. Without the most recent security patches, your system is vulnerable to cyberattacks.
- Please refer to the [Dependencies](../../installing/dependencies.mdx) page to learn more about the system packages used by PEM.
+- Ensure PEM, your operating system, and third-party libraries are regularly updated. Without the most recent security patches, your system is vulnerable to cyberattacks.
+ See [Dependencies](../../installing/dependencies.mdx) to learn more about the system packages used by PEM.
-2. Ensure the Postgres instance used as the PEM server is kept up to date and apply [Postgres security best practices](https://info.enterprisedb.com/rs/069-ALB-339/images/Security-best-practices-2020.pdf).
+- Ensure the Postgres instance used as the PEM server is kept up to date and apply [Postgres security best practices](https://info.enterprisedb.com/rs/069-ALB-339/images/Security-best-practices-2020.pdf).
-3. [Secure the web server](apache_httpd_security_configuration.mdx)
+- [Secure the web server](apache_httpd_security_configuration.mdx).
-4. Configure the [security settings of the PEM web application](pem_application_configuration.mdx) as appropriate.
+- Configure the [security settings of the PEM web application](pem_application_configuration.mdx) as appropriate.
diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
index e45a802d116..8de3657b205 100644
--- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
@@ -9,30 +9,29 @@ redirects:
## Session timeout
-Insufficient session expiration by the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. We recommend that you set the inactivity timeout for the web application to a low value to avoid this security issue.
+Setting session the expiration time too long in the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. We recommend that you set the inactivity timeout for the web application to a low value to avoid this security issue.
-In PEM, you can set the timeout value for a user session. When there's no user activity for a specified duration on the web console, PEM logs out the user from the web console. A PEM administrator can set the length of time for inactivity. This value is for the whole application and not for each user. To configure the timeout duration, modify the `USER_INACTIVITY_TIMEOUT` parameter in the `config_local.py` file, located in the `/web` directory. By default, this functionality is disabled.
+In PEM, you can set the timeout value for a user session. When there's no user activity for a specified duration on the web console, PEM logs the user out of the web console. A PEM administrator can set the length of time for inactivity. This value is for the whole application, not for each user.
-For example, to specify for an application to log out a user after 15 minutes of inactivity, set:
+To configure the timeout duration, modify the `USER_INACTIVITY_TIMEOUT` parameter in the `config_local.py` file in the `/web` directory. By default, this parameter is disabled. Specify the value in seconds.
+
+For example, to specify for an application to log a user out after 15 minutes of inactivity, set the time as follows:
```ini
USER_INACTIVITY_TIMEOUT = 900
```
-!!! Note
- The timeout value is specified in seconds.
-
-To apply the changes, restart the Apache service.
+To apply the change, restart the Apache service.
-For detailed information on the `config.py` file, see [Managing Configuration Settings](../../managing_configuration_settings/).
+For detailed information on the `config.py` file, see [Managing configuration settings](../../managing_configuration_settings/).
## RestAPI header customization
-You can customize the RestAPI token headers to meet your requirements. The default values aren't exposed by the `config.py` file. Customize the following headers in the `config_local.py` file:
+You can customize the RestAPI token headers to meet your requirements. The default values aren't exposed by the `config.py` file. In the `config_local.py` file, customize the following headers.
### PEM_HEADER_SUBJECT_TOKEN_KEY
-This configuration option allows you to change the HTTP header name to get the generated token. By default, when you send a request to create a token, the server response has an `X-Subject-Token` header. This header contains the value of a newly generated token. If you want to customize the header name, then you can update the `config_local.py` file:
+This configuration option lets you change the HTTP header name to get the generated token. By default, when you send a request to create a token, the server response has an `X-Subject-Token` header. This header contains the value of a newly generated token. If you want to customize the header name, then you can update the `config_local.py` file:
```ini
PEM_HEADER_SUBJECT_TOKEN_KEY = 'Pem-RestAPI-Generate-Token'
@@ -51,13 +50,13 @@ Pem-RestAPI-Generate-Token: 997aef95-d46d-4d84-932a-a80146eaf84f
### PEM_HEADER_TOKEN_KEY
-This configuration option allows you to change the HTTP request header name. With this header name, you can send the token to the PEM server. By default, when you send a request to generate a token, the token header name is `X-Auth-Token`. If you want to customize the RestAPI request header name, then you can update the `config_local.py` file:
+This configuration option lets you change the header name of the HTTP request. With this header name, you can send the token to the PEM server. By default, when you send a request to generate a token, the token header name is `X-Auth-Token`. If you want to customize the RestAPI request header name, you can update the `config_local.py` file:
```ini
PEM_HEADER_TOKEN_KEY = 'Pem-Token'
```
-This setting allows you to send the token:
+This setting lets you send the token:
```shell
$ curl -Lk -X GET -H "Pem-Token: gw5rzaloxydp91ttd1c97w24b5sv60clic24sxy9" https://localhost:8443/pem/api/v4/agent
@@ -65,35 +64,35 @@ $ curl -Lk -X GET -H "Pem-Token: gw5rzaloxydp91ttd1c97w24b5sv60clic24sxy9" https
### PEM_TOKEN_EXPIRY
-This configuration option allows you to change the PEM RestAPI token expiry time after it's generated. By default, the token expiry time is set to 20 minutes (1200 seconds). If you want to change the token expiry time to 10 minutes, then you can update the `config_local.py` file:
+This configuration option lets you change the PEM RestAPI token expiry time after it's generated. By default, the token expiry time is set to 20 minutes (1200 seconds). For example, to change the token expiry time to 10 minutes, update the `config_local.py` file as follows:
```ini
PEM_TOKEN_EXPIRY = 600
```
-To apply the changes, restart the Apache service.
+To apply the change, restart the Apache service.
## Role-based access control in PEM
-Role-based access control (RBAC) restricts application access based on a user’s role in an organization and is one of the primary methods for access control. The roles in RBAC refer to the levels of access that users have to the application. Users are allowed to access only the information needed to do their jobs. Roles in PEM are inheritable and additive, rather than subscriptive. In other words, as a PEM admin you need to grant the lowest level role to the user and then grant the roles the user needs to perform their job. For example, to give access only to SQL profiler:
+Role-based access control (RBAC) restricts application access based on a user’s role in an organization. It's one of the primary methods for access control. The roles in RBAC refer to the levels of access that users have to the application. Users are allowed to access only the information needed to do their jobs. Roles in PEM are inheritable and additive rather than subscriptive. In other words, as a PEM admin, you need to grant the lowest level role to the user and then grant the roles the user needs to perform their job. For example, to give access only to SQL Profiler:
```sql
CREATE ROLE user_sql_profiler WITH LOGIN NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT NOREPLICATION CONNECTION LIMIT -1 PASSWORD 'xxxxxx';
GRANT pem_user, pem_comp_sqlprofiler TO user_sql_profiler;
```
-For detailed information on roles, see [PEM Roles](../../managing_pem_server/#using-pem-predefined-roles-to-manage-access-to-pem-functionality).
+For detailed information on roles, see [PEM roles](../../managing_pem_server/#using-pem-predefined-roles-to-manage-access-to-pem-functionality).
## SQL/Protect plugin
-Often, preventing an SQL injection attack is the responsibility of the application developer, while the database administrator has little or no control over the potential threat. The difficulty for database administrators is that the application must have access to the data to function properly.
+Often, preventing an SQL injection attack is the responsibility of the application developer. The database administrator has little or no control over the potential threat. The difficulty for database administrators is that the application must have access to the data to function properly.
SQL/Protect is a module that allows a database administrator to protect a database from SQL injection attacks. SQL/Protect examines incoming queries for typical SQL injection profiles in addition to the standard database security policies.
Attackers can perpetrate SQL injection attacks with several different techniques. A specific signature characterizes each technique. SQL/Protect examines queries for unauthorized relations, utility commands, SQL tautology, and unbounded DML statements. SQL/Protect gives the control back to the database administrator by alerting the administrator to potentially dangerous queries and then blocking those queries.
!!! Note
- This plugin works only on the EDB Postgres Advanced Server server, so this is useful only when your PEM database is hosted on the EDB Postgres Advanced Server server.
+ This plugin is useful only when your PEM database is hosted on the EDB Postgres Advanced Server server. It doesn't work on other servers.
For detailed information about the SQL Profiler plugin, see [SQL Profiler](../../profiling_workloads/).
@@ -110,7 +109,9 @@ One security tip for PEM administrative users is to change your PEM login passwo
In most cases, pemAgent is installed as a root user and runs as a daemon process with root privileges. By default, PEM disables running the scheduled jobs/task. PEM provides support for running scheduled jobs as a non-root user by changing the pemAgent configuration file.
-To run scheduled jobs as a non-root user, modify the entry for the `batch_script_user` parameter in the `agent.cfg` file and specify the user to run the script. You can either specify a non-root user or root user identity. If you don't specify a user, or the specified user doesn't exist, then the script doesn't execute. Restart the agent after modifying the file. If a non-root user is running `pemagent`, then the value of `batch_script_user` is ignored, and the same non-root user used for running the `pemagent` executes the script.
+To run scheduled jobs as a non-root user, modify the entry for the `batch_script_user` parameter in the `agent.cfg` file and specify the user to run the script. You can specify either a non-root user or root user identity. If you don't specify a user or the specified user doesn't exist, the script doesn't execute.
+
+After modifying the file, restart the agent. If a non-root user is running pemAgent, the value of `batch_script_user` is ignored. The same non-root user used for running the pemAgent executes the script.
To invoke a script on a Windows system, set the registry entry for `AllowBatchJobSteps` to `true` and restart the PEM agent. PEM registry entries are located in:
From fb5527dfa76d2f5537cbe807ae1889211ec85f5e Mon Sep 17 00:00:00 2001
From: Betsy Gitelman
Date: Tue, 29 Apr 2025 14:35:36 -0400
Subject: [PATCH 36/42] Edits to PEM 10 - group 3
---
.../docs/pem/10/considerations/index.mdx | 7 +-
...rver_and_apache_web_server_preferences.mdx | 8 +-
.../pem_application_configuration.mdx | 6 +-
.../10/considerations/setup_ha_using_efm.mdx | 513 ++++++++++++++++++
4 files changed, 523 insertions(+), 11 deletions(-)
create mode 100644 product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
diff --git a/product_docs/docs/pem/10/considerations/index.mdx b/product_docs/docs/pem/10/considerations/index.mdx
index 8cc82f17176..92f5d7726cb 100644
--- a/product_docs/docs/pem/10/considerations/index.mdx
+++ b/product_docs/docs/pem/10/considerations/index.mdx
@@ -9,13 +9,12 @@ navigation:
- installing_pem_server_and_apache_web_server_preferences
---
-There are a number of things to consider before deploying Postgres Enterprise Manager.
+Before deploying Postgres Enterprise Manager, consider these factors.
| Considerations | Implementation instructions |
| ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| Is a standalone server sufficient or do you need a high availability architecture? | [Installing the server](../installing/) or [Deploying high availability](ha_pem/) |
| Do you need to implement connection pooling? | [Deploying connection pooling](pem_pgbouncer/) |
-| What type of authentication to use? | [Authentication options](authentication_options/) |
+| What type of authentication should you use? | [Authentication options](authentication_options/) |
| What actions should you take to avoid security vulnerabilities? | [Securing your deployment](pem_security_best_practices/) |
-| Where to host the web server? | [Web server installation options](installing_pem_server_and_apache_web_server_preferences) |
-
+| Where should you host the web server? | [Web server installation options](installing_pem_server_and_apache_web_server_preferences) |
diff --git a/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx b/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx
index 5cce3fe4d48..ab2bfa6d9f8 100644
--- a/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx
+++ b/product_docs/docs/pem/10/considerations/installing_pem_server_and_apache_web_server_preferences.mdx
@@ -9,19 +9,19 @@ redirects:
---
-During the PEM server installation, you can specify your hosting preferences for the web server.
+While installing the PEM server, you can specify your hosting preferences for the web server.
For production environments, best practice is to have the PEM server and web server on separate hosts.
## PEM server and web server on separate hosts
1. Install the PEM server on both the hosts. See [Installing the PEM server](../installing/).
-2. Configure the PEM server host by selecting the **Database** option on the first host.
-3. Configure a web server by selecting the **Web Services** option on the second host.
+1. Configure the PEM server host by selecting the **Database** option on the first host.
+1. Configure a web server by selecting the **Web Services** option on the second host.
For more information about configuring a PEM server, see [Configuring the PEM server on Linux platforms](../installing/configuring_the_pem_server_on_linux/).
## PEM server and web server on the same host
1. Install the PEM server. See [Installing the PEM server](../installing/).
-2. Run the configuration script. Select the **Web Services and Database** option to install the PEM server and web server on the same host. See [Configuring the PEM server on Linux](../installing/configuring_the_pem_server_on_linux/).
+1. Run the configuration script. To install the PEM server and web server on the same host, select the **Web Services and Database** option. See [Configuring the PEM server on Linux](../installing/configuring_the_pem_server_on_linux/).
diff --git a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
index 8de3657b205..2bd09fe78f1 100644
--- a/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
+++ b/product_docs/docs/pem/10/considerations/pem_security_best_practices/pem_application_configuration.mdx
@@ -9,7 +9,7 @@ redirects:
## Session timeout
-Setting session the expiration time too long in the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. We recommend that you set the inactivity timeout for the web application to a low value to avoid this security issue.
+Setting session expiration time too long in the web application increases the exposure of other session-based attacks. The attacker has more time to reuse a valid session ID and hijack the associated session. The shorter the session interval is, the less time an attacker has to use the valid session ID. To avoid this security issue, we recommend that you set the inactivity timeout for the web application to a low value.
In PEM, you can set the timeout value for a user session. When there's no user activity for a specified duration on the web console, PEM logs the user out of the web console. A PEM administrator can set the length of time for inactivity. This value is for the whole application, not for each user.
@@ -89,7 +89,7 @@ Often, preventing an SQL injection attack is the responsibility of the applicati
SQL/Protect is a module that allows a database administrator to protect a database from SQL injection attacks. SQL/Protect examines incoming queries for typical SQL injection profiles in addition to the standard database security policies.
-Attackers can perpetrate SQL injection attacks with several different techniques. A specific signature characterizes each technique. SQL/Protect examines queries for unauthorized relations, utility commands, SQL tautology, and unbounded DML statements. SQL/Protect gives the control back to the database administrator by alerting the administrator to potentially dangerous queries and then blocking those queries.
+Attackers can perpetrate SQL injection attacks using several different techniques. A specific signature characterizes each technique. SQL/Protect examines queries for unauthorized relations, utility commands, SQL tautology, and unbounded DML statements. SQL/Protect gives the control back to the database administrator by alerting the administrator to potentially dangerous queries and then blocking those queries.
!!! Note
This plugin is useful only when your PEM database is hosted on the EDB Postgres Advanced Server server. It doesn't work on other servers.
@@ -98,7 +98,7 @@ For detailed information about the SQL Profiler plugin, see [SQL Profiler](../..
## Password management
-One security tip for PEM administrative users is to change your PEM login passwords to something new regularly. Changing your password:
+One security tip for PEM administrative users is to regularly change your PEM login passwords to something new. Changing your password:
- Prevents breaches of multiple accounts
- Prevents constant access
diff --git a/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx b/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
new file mode 100644
index 00000000000..27a37fb6c8c
--- /dev/null
+++ b/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
@@ -0,0 +1,513 @@
+---
+title: "Using Failover Manager for high availability "
+navTitle: "Deploying high availability"
+redirects:
+- /pem/latest/pem_ha_setup/
+- /pem/latest/pem_ha_setup/setup_ha_using_efm/
+---
+
+!!! Important
+This page is under review and has not been updated for PEM 10.
+We plan to publish new documentation on HA patterns in PEM alongside some software changes to facilitate these patterns in PEM 10.1.
+!!!
+
+!!! Note
+ This procedure is for setting up Failover Manager for a PEM server with a new installation, not with an existing one. The provided commands apply to configuring RHEL-based systems where HTTPD is used for the web server services.
+
+Postgres Enterprise Manager (PEM) helps database administrators, system architects, and performance analysts to administer, monitor, and tune Postgres database servers.
+
+Failover Manager is a high-availability tool from EDB that enables a Postgres primary node to failover to a standby node during a software or hardware failure on the primary.
+
+The examples that follow use these IP addresses:
+
+- 172.16.161.200 - PEM Primary
+- 172.16.161.201 - PEM Standby 1
+- 172.16.161.202 - PEM Standby 2
+- 172.16.161.203 - EFM Witness Node
+- 172.16.161.245 - PEM VIP (used by agents and users to connect)
+
+The following must use the VIP address:
+
+- The PEM agent binding of the monitored database servers
+- Accessing the PEM web client
+- Accessing the webserver services
+
+## Initial product installation and configuration
+
+1. Install the following on the primary and one or more standbys:
+
+ - [EDB Postgres Advanced Server](/epas/latest/installing/) (backend database for PEM server)
+ - [PEM server](/pem/latest/installing/)
+ - [EDB Failover Manager 4.1](/efm/latest/installing/)
+
+ Refer to these installation instructions in the product documentation, or see the instructions on the [EDB repos website](https://repos.enterprisedb.com). To access the EDB repositories, replace `USERNAME:PASSWORD` with your username and password in the instructions.
+
+ Make sure that the database server is configured to use the scram-sha-256 authentication method, as the PEM server configuration script doesn't work with trust authentication.
+
+ You must install the `java-1.8.0-openjdk` package to install EFM.
+
+1. Configure the PEM server on the primary server as well as on all the standby servers with an initial configuration of type 1 (web services and database):
+
+ ```shell
+ /usr/edb/pem/bin/configure-pem-server.sh -t 1
+ ```
+ For more detail on configuration types, see [Configuring the PEM server on Linux](/pem/latest/installing/configuring_the_pem_server_on_linux/).
+
+1. To allow the access, add the following ports in the firewall on the primary and all the standby servers:
+
+ - `8443` for PEM server (https)
+ - `5444` for EDB Postgres Advanced Server 13
+ - `7800` for EFM
+ - `7908` for EFM admin
+
+ For example:
+
+ ```shell
+ $ sudo firewall-cmd --zone=public --add-port=5444/tcp --permanent
+ success
+ $ sudo firewall-cmd --zone=public --add-port=8443/tcp --permanent
+ success
+ $ sudo firewall-cmd --zone=public --add-port=7800/tcp --permanent
+ success
+ $ sudo firewall-cmd --zone=public --add-port=7809/tcp --permanent
+ success
+ $ sudo firewall-cmd --reload
+ success
+ ```
+
+## Set up the primary node for streaming replication
+
+1. Create the replication role, replacing `` with the password you choose.
+
+ ```shell
+ $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “CREATE ROLE repl REPLICATION LOGIN PASSWORD ''”;
+ CREATE ROLE
+ ```
+
+1. Configure the following in the `postgresql.conf` file:
+
+ ```ini
+ wal_level = replica
+ max_wal_senders = 10
+ wal_keep_size = 500
+ max_replication_slots = 10
+ ```
+
+ For more information on configuring parameters for streaming replication, see the [PostgreSQL documentation](https://www.postgresql.org/docs/13/warm-standby.html#STREAMING-REPLICATION).
+
+ !!! Note
+ The configuration parameters might differ for different versions of the database server. You can email EDB Support at [techsupport@enterprisedb.com](mailto:techsupport@enterprisedb.com) for help with setting up these parameters.
+
+1. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow the replication user to connect from all the standbys:
+
+ ```shell
+ hostssl replication repl 172.16.161.201/24 scram-sha-256
+ ```
+
+ !!! Note
+ You can change the cidr range of the IP address, if needed.
+
+1. Modify the host-based authentication file (`/var/lib/edb/as13/data/pg_hba.conf`) for the pem_user role to connect to all databases using the scram-sha-256 authentication method:
+
+ ```shell
+ # Allow local PEM agents and admins to connect to PEM server
+ hostssl all +pem_user 172.16.161.201/24 scram-sha-256
+ hostssl pem +pem_user 127.0.0.1/32 scram-sha-256
+ hostssl pem +pem_agent 127.0.0.1/32 cert
+ # Allow remote PEM agents and users to connect to PEM server
+ hostssl pem +pem_user 0.0.0.0/0 scram-sha-256
+ hostssl pem +pem_agent 0.0.0.0/0 cert
+ ```
+
+1. Restart the EDB Postgres Advanced Server 13 server.
+
+ ```shell
+ systemctl restart edb-as-13.service
+ ```
+
+## Set up the standby nodes for streaming replication
+
+1. Stop the service for EDB Postgres Advanced Server 13 on all the standby nodes:
+
+ ```shell
+ $ systemctl stop edb-as-13.service
+ ```
+
+ !!! Note
+ This example uses the pg_basebackup utility to create the replicas of the PEM backend database server on the standby servers. When using pg_basebackup, you need to stop the existing database server and remove the existing data directories.
+
+1. Remove the data directory of the database server on all the standby nodes:
+
+ ```shell
+ $ sudo su - enterprisedb
+
+ $ rm -rf /var/lib/edb/as13/data/*
+ ```
+
+1. Create the `.pgpass` file in the home directory of the enterprisedb user on all the standby nodes:
+
+ ```shell
+ $ sudo su - enterprisedb
+
+ $ cat > ~/.pgpass << _EOF_
+ 172.16.161.200:5444:replication:repl:CHANGE_ME
+ 172.16.161.201:5444:replication:repl:CHANGE_ME
+ 172.16.161.202:5444:replication:repl:CHANGE_ME
+ _EOF_
+
+ $ chmod 600 ~/.pgpass
+ ```
+
+1. Take the backup of the primary node on each of the standby nodes using pg_basebackup:
+
+ ```shell
+ $ sudo su - enterprisedb /usr/edb/as13/bin/pg_basebackup -h 172.16.161.200 \
+ -D /var/lib/edb/as13/data -U repl -v -P -Fp -R -p 5444
+ ```
+
+ The `backup` command creates the `postgresql.auto.conf` and `standby.signal` files on the standby nodes. The `postgresql.auto.conf` file has the following content:
+
+ ```shell
+ sudo su - enterprisedb cat /var/lib/edb/as13/data/postgresql.auto.conf
+ # Do not edit this file manually
+ # It will be overwritten by the ALTER SYSTEM command.
+ primary_conninfo = ‘user=repl passfile=’’/var/lib/edb/.pgpass’’ channel_binding=prefer host=172.16.161.200 port=5444 sslmode=prefer sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsvrname=postgres target_session_attrs=any’
+ ```
+
+1. In the `postgresql.conf` file on each of the standby nodes, edit the following parameter:
+
+ ```ini
+ hot_standby = on
+ ```
+
+1. Start the EDB Postgres Advanced Server 13 database server on each of the standby nodes:
+
+ ```shell
+ $ systemctl enable edb-as-13
+
+ $ systemctl start edb-as-13
+ ```
+
+1. Copy the following files from the primary node to the standby nodes at the same location, overwriting any existing files. Set the permissions on the files:
+
+ - `/etc/httpd/conf.d/edb-pem.conf`
+ - `/etc/httpd/conf.d/edb-ssl-pem.conf`
+ - `/root/.pem/agent1.crt`
+ - `/root/.pem/agent1.key`
+ - `/usr/edb/pem/agent/etc/agent.cfg`
+ - `/usr/edb/pem/share/.install-config`
+ - `/usr/edb/pem/web/pem.wsgi`
+ - `/usr/edb/pem/web/config_setup.py`
+
+For example:
+
+```shell
+ $ mkdir -p /root/.pem
+ $ chown root:root /root/.pem
+ $ chmod 0755 /root/.pem
+ $ mkdir -p /var/lib/pemhome/.pem
+ $ chown pem:pem /var/lib/pemhome/.pem
+ $ chmod 0700 /var/lib/pemhome/.pem
+ $ mkdir -p /usr/edb/pem/logs
+ $ chown root:root /usr/edb/pem/logs
+ $ chmod 0755 /usr/edb/pem/logs
+ $ for file in /etc/httpd/conf.d/edb-pem.conf \
+ /etc/httpd/conf.d/edb-ssl-pem.conf \
+ /root/.pem/agent1.crt \
+ /usr/edb/pem/agent/etc/agent.cfg \
+ /usr/edb/pem/share/.install-config \
+ /usr/edb/pem/web/pem.wsgi \
+ /usr/edb/pem/web/config_setup.py; do \
+ chown root:root ${file}; \
+ chmod 0644 ${file}; \
+ done;
+ $ chmod 0600 /root/.pem/agent1.key
+ $ chown root:root /root/.pem/agent1.key
+```
+
+This code ensures that the webserver is configured on the standby and is disabled by default. Switchover by EFM enables the webserver.
+
+!!! Note
+ Manually keep the certificates in sync on master and standbys whenever the certificates are updated.
+
+1. Run the `configure-selinux.sh` script to configure the SELinux policy for PEM:
+
+```shell
+ $ /usr/edb/pem/bin/configure-selinux.sh
+ getenforce found, now executing 'getenforce' command
+ Configure the httpd to work with the SELinux
+ Allow the httpd to connect the database (httpd_can_network_connect_db = on)
+ Allow the httpd to connect the network (httpd_can_network_connect = on)
+ Allow the httpd to work with cgi (httpd_enable_cgi = on)
+ Allow to read & write permission on the 'pem' user home directory
+ SELinux policy is configured for PEM
+
+ $ sudo chmod 640 /root/.pem/agent1.crt
+```
+
+1. If HTTPD and PEM agent services are running on all replica nodes, disable and stop them:
+
+```shell
+systemctl stop pemagent
+systemctl stop httpd
+systemctl disable pemagent
+systemctl disable httpd
+```
+
+!!! Note
+ At this point, a PEM primary server and two standbys are ready to take over from the primary whenever needed.
+
+
+## Set up EFM to manage failover on all hosts
+
+1. Prepare the primary node to support EFM:
+
+ - Create a database user efm to connect to the database servers.
+ - Grant the execute privileges on the functions related to WAL logs and the monitoring privileges to the user.
+ - Add entries in `pg_hba.conf` to allow the efm database user to connect to the database server from all nodes on all the hosts.
+ - Reload the configurations on all the database servers.
+
+ For example:
+
+ ```sql
+ $ cat > /tmp/efm-role.sql << _EOF_
+ -- Create a role for EFM
+ CREATE ROLE efm LOGIN PASSWORD 'password';
+
+ -- Give privilege to 'efm' user to connect to a database
+ GRANT CONNECT ON DATABASE edb TO efm;
+
+ -- Give privilege to 'efm' user to do backup operations
+ GRANT EXECUTE ON FUNCTION pg_current_wal_lsn() TO efm;
+ GRANT EXECUTE ON FUNCTION pg_last_wal_replay_lsn() TO efm;
+ GRANT EXECUTE ON FUNCTION pg_wal_replay_resume() TO efm;
+ GRANT EXECUTE ON FUNCTION pg_wal_replay_pause() TO efm;
+ GRANT EXECUTE ON FUNCTION pg_reload_conf() TO efm;
+
+ -- Grant monitoring privilege to the 'efm' user
+ GRANT pg_monitor TO efm;
+ _EOF_
+
+ $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -f /tmp/efm-role.sql
+ CREATE ROLE
+ GRANT
+ GRANT
+ GRANT
+ GRANT
+ GRANT
+ GRANT
+ GRANT ROLE
+
+ $ rm -f /tmp/efm-role.sql
+
+ $ cat > /var/lib/edb/as13/data/pg_hba.conf <<< _EOF_
+ hostssl edb efm 172.16.161.200/32 scram-sha-256
+ hostssl edb efm 172.16.161.201/32 scram-sha-256
+ hostssl edb efm 172.16.161.202/32 scram-sha-256
+ hostssl edb efm 172.16.161.203/32 scram-sha-256
+ _EOF_
+
+ $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “SELECT pg_reload_conf();”
+ ```
+
+1. Create the scripts on each node to start/stop the PEM agent:
+
+ ```shell
+ $ sudo cat > /usr/local/bin/start-httpd-pemagent.sh << _EOF_
+ #!/bin/sh
+ /bin/sudo /bin/systemctl enable httpd
+ /bin/sudo /bin/systemctl start httpd
+ /bin/sudo /bin/systemctl enable pemagent
+ /bin/sudo /bin/systemctl start pemagent
+ _EOF_
+ $ sudo cat > /usr/local/bin/stop-httpd-pemagent.sh << _EOF_
+ #!/bin/sh
+
+ /bin/sudo /bin/systemctl stop pemagent
+ /bin/sudo /bin/systemctl disable pemagent
+ /bin/sudo /bin/systemctl stop httpd
+ /bin/sudo /bin/systemctl disable httpd
+ _EOF_
+ $ sudo chmod 770 /usr/local/bin/start-pemagent.sh
+ $ sudo chmod 770 /usr/local/bin/stop-pemagent.sh
+ ```
+
+1. Create a `sudoers` file (`/etc/sudoers.d/efm-pem`) on each node to allow the efm user to start/stop the pemagent:
+
+ ```shell
+ $ sudo cat > /etc/sudoers.d/efm-pem << _EOF_
+ efm ALL=(ALL) NOPASSWD: /bin/systemctl enable pemagent
+ efm ALL=(ALL) NOPASSWD: /bin/systemctl disable pemagent
+ efm ALL=(ALL) NOPASSWD: /bin/systemctl stop pemagent
+ efm ALL=(ALL) NOPASSWD: /bin/systemctl start pemagent
+ efm ALL=(ALL) NOPASSWD: /bin/systemctl status pemagent
+ _EOF_
+ ```
+
+1. Create an `efm.nodes` file on all nodes using the sample file (`/etc/edb/efm-4.1/efm.nodes.in`), and give read-write access to the efm OS user:
+
+ ```shell
+ $ sudo cp /etc/edb/efm-4.1/efm.nodes.in /etc/edb/efm-4.1/efm.nodes
+ $ sudo chown efm:efm /etc/edb/efm-4.1/efm.nodes
+ $ sudo chmod 600 /etc/edb/efm-4.1/efm.nodes
+ ```
+
+1. Add the IP address and efm port of the primary node in the `/etc/edb/efm-4.1/efm.nodes` file on the standby nodes:
+
+ ```shell
+ $ sudo cat > /etc/edb/efm-4.1/efm.nodes <<< _EOF_
+ 172.16.161.200:7800
+ _EOF_
+ ```
+
+1. Create the `efm.properties` file on all the nodes using the sample file (`/etc/edb/efm-4.1/efm.properties.in`). Grant read access to all the users:
+
+ ```shell
+ $ sudo cp /etc/edb/efm-4.1/efm.properties.in /etc/edb/efm-4.1/efm.properties
+ $ sudo chown efm:efm /etc/edb/efm-4.1/efm.properties
+ $ sudo chmod a+r /etc/edb/efm-4.1/efm.properties
+ ```
+
+1. Encrypt the efm user's password using the efm utility:
+
+ ```shell
+ $ export EFMPASS=password
+ $ /usr/edb/efm-4.1/bin/efm encrypt efm --from-env
+ 096666746b05b081d1a98e43d94c9dad
+ ```
+
+1. Edit the following parameters in the properties file:
+
+ ```ini
+ db.user=efm
+ db.password.encrypted=096666746b05b081d1a98e43d94c9dad
+ db.port=5444
+ db.database=edb
+ db.service.owner=enterprisedb
+ db.service.name=edb-as-13
+ db.bin=/usr/edb/as13/bin
+ db.data.dir=/var/lib/edb/as13/data
+ jdbc.sslmode=require
+ user.email=username@example.com
+ from.email=node1@efm-pem
+ notification.level=INFO
+ notification.text.prefix=[PEM/EFM]
+ bind.address=172.16.161.200:7800
+ admin.port=7809
+ is.witness=false
+ local.period=10
+ local.timeout=60
+ local.timeout.final=10
+ remote.timeout=10
+ node.timeout=50
+ encrypt.agent.messages=true
+ stop.isolated.primary=true
+ stop.failed.primary=true
+ primary.shutdown.as.failure=false
+ update.physical.slots.period=0
+ ping.server.ip=8.8.8.8
+ ping.server.command=/bin/ping -q -c3 -w5
+ auto.allow.hosts=false
+ stable.nodes.file=false
+ db.reuse.connection.count=0
+ auto.failover=true
+ auto.reconfigure=true
+ promotable=true
+ use.replay.tiebreaker=true
+ standby.restart.delay=0
+ reconfigure.num.sync=false
+ reconfigure.sync.primary=false
+ minimum.standbys=0
+ recovery.check.period=1
+ restart.connection.timeout=60
+ auto.resume.period=0
+ virtual.ip=172.16.161.245
+ virtual.ip.interface=ens33
+ virtual.ip.prefix=24
+ virtual.ip.single=true
+ check.vip.before.promotion=true
+ pgpool.enable=false
+ sudo.command=sudo
+ sudo.user.command=sudo -u %u
+ syslog.host=localhost
+ syslog.port=514
+ syslog.protocol=UDP
+ syslog.facility=LOCAL1
+ file.log.enabled=true
+ syslog.enabled=false
+ jgroups.loglevel=INFO
+ efm.loglevel=INFO
+ jvm.options=-Xmx128m
+ script.remote.post.promotion=/usr/local/bin/stop-pemagent.sh
+ script.post.promotion=/usr/local/bin/start-pemagent.sh
+ ```
+
+1. Set the value of the `is.witness` configuration parameter on the witness node to `true`:
+
+ ```ini
+ is.witness=true
+ ```
+
+1. Enable and start the EFM service on the primary node:
+
+ ```shell
+ $ systemctl enable edb-efm-4.1
+ $ systemctl start edb-efm-4.1
+ ```
+
+1. Allow the standbys to join the cluster started on the primary node:
+
+ ```shell
+ /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.201
+ /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.202
+ /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.203
+ ```
+
+1. Enable and start the EFM service on the standby nodes and the EFM witness node:
+
+ ```shell
+ $ systemctl enable edb-efm-4.1
+ $ systemctl start edb-efm-4.1
+ ```
+
+11. Check the EFM cluster status from any node:
+
+ ```shell
+ $ sudo /usr/edb/efm-4.1/bin/efm cluster-status efm
+ Cluster Status: efm
+ Agent Type Address DB VIP
+ ----------------------------------------------------------------
+ Primary 172.16.161.200 UP 172.16.161.245*
+ Standby 172.16.161.201 UP 172.16.161.245
+ Standby 172.16.161.202 UP 172.16.161.245
+ Witness 172.16.161.203 N/A 172.16.161.245
+
+ Allowed node host list:
+ 172.16.161.200 172.16.161.201 172.16.161.202 172.16.161.203
+
+ Membership coordinator: 172.16.161.200
+
+ Standby priority host list:
+ 172.16.161.201 172.16.161.202
+
+ Promote Status:
+
+ DB Type Address WAL Received LSN WAL Replayed LSN Info
+ ---------------------------------------------------------------------------
+ Primary 172.16.161.200 0/F7A3808
+ Standby 172.16.161.201 0/F7A3808 0/F7A3808
+ Standby 172.16.161.202 0/F7A3808 0/F7A3808
+
+ Standby database(s) in sync with primary. It is safe to promote.
+ ```
+
+This status confirms that EFM is set up successfully and managing the failover for the PEM server.
+
+In case of failover, any of the standbys are promoted as the primary node, and PEM agents connect to the new primary node. You can replace the failed primary node with a new standby using this procedure.
+
+## Current limitations
+
+The current limitations include:
+- Web console sessions for the users are lost during the switchover.
+- Per-user settings set from the Preferences dialog box are lost, as they’re stored in local configuration files on the file system.
+- Background processes started by the Backup, Restore, and Maintenance dialogs boxes and their logs aren't shared between the systems. They're lost during switchover.
From 867d7a3205fd091fc519b6ee1b65e24c4a5d4114 Mon Sep 17 00:00:00 2001
From: nidhibhammar <59045594+nidhibhammar@users.noreply.github.com>
Date: Mon, 19 May 2025 14:22:03 +0530
Subject: [PATCH 37/42] removed ha_using_efm file
---
.../10/considerations/setup_ha_using_efm.mdx | 513 ------------------
1 file changed, 513 deletions(-)
delete mode 100644 product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
diff --git a/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx b/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
deleted file mode 100644
index 27a37fb6c8c..00000000000
--- a/product_docs/docs/pem/10/considerations/setup_ha_using_efm.mdx
+++ /dev/null
@@ -1,513 +0,0 @@
----
-title: "Using Failover Manager for high availability "
-navTitle: "Deploying high availability"
-redirects:
-- /pem/latest/pem_ha_setup/
-- /pem/latest/pem_ha_setup/setup_ha_using_efm/
----
-
-!!! Important
-This page is under review and has not been updated for PEM 10.
-We plan to publish new documentation on HA patterns in PEM alongside some software changes to facilitate these patterns in PEM 10.1.
-!!!
-
-!!! Note
- This procedure is for setting up Failover Manager for a PEM server with a new installation, not with an existing one. The provided commands apply to configuring RHEL-based systems where HTTPD is used for the web server services.
-
-Postgres Enterprise Manager (PEM) helps database administrators, system architects, and performance analysts to administer, monitor, and tune Postgres database servers.
-
-Failover Manager is a high-availability tool from EDB that enables a Postgres primary node to failover to a standby node during a software or hardware failure on the primary.
-
-The examples that follow use these IP addresses:
-
-- 172.16.161.200 - PEM Primary
-- 172.16.161.201 - PEM Standby 1
-- 172.16.161.202 - PEM Standby 2
-- 172.16.161.203 - EFM Witness Node
-- 172.16.161.245 - PEM VIP (used by agents and users to connect)
-
-The following must use the VIP address:
-
-- The PEM agent binding of the monitored database servers
-- Accessing the PEM web client
-- Accessing the webserver services
-
-## Initial product installation and configuration
-
-1. Install the following on the primary and one or more standbys:
-
- - [EDB Postgres Advanced Server](/epas/latest/installing/) (backend database for PEM server)
- - [PEM server](/pem/latest/installing/)
- - [EDB Failover Manager 4.1](/efm/latest/installing/)
-
- Refer to these installation instructions in the product documentation, or see the instructions on the [EDB repos website](https://repos.enterprisedb.com). To access the EDB repositories, replace `USERNAME:PASSWORD` with your username and password in the instructions.
-
- Make sure that the database server is configured to use the scram-sha-256 authentication method, as the PEM server configuration script doesn't work with trust authentication.
-
- You must install the `java-1.8.0-openjdk` package to install EFM.
-
-1. Configure the PEM server on the primary server as well as on all the standby servers with an initial configuration of type 1 (web services and database):
-
- ```shell
- /usr/edb/pem/bin/configure-pem-server.sh -t 1
- ```
- For more detail on configuration types, see [Configuring the PEM server on Linux](/pem/latest/installing/configuring_the_pem_server_on_linux/).
-
-1. To allow the access, add the following ports in the firewall on the primary and all the standby servers:
-
- - `8443` for PEM server (https)
- - `5444` for EDB Postgres Advanced Server 13
- - `7800` for EFM
- - `7908` for EFM admin
-
- For example:
-
- ```shell
- $ sudo firewall-cmd --zone=public --add-port=5444/tcp --permanent
- success
- $ sudo firewall-cmd --zone=public --add-port=8443/tcp --permanent
- success
- $ sudo firewall-cmd --zone=public --add-port=7800/tcp --permanent
- success
- $ sudo firewall-cmd --zone=public --add-port=7809/tcp --permanent
- success
- $ sudo firewall-cmd --reload
- success
- ```
-
-## Set up the primary node for streaming replication
-
-1. Create the replication role, replacing `` with the password you choose.
-
- ```shell
- $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “CREATE ROLE repl REPLICATION LOGIN PASSWORD ''”;
- CREATE ROLE
- ```
-
-1. Configure the following in the `postgresql.conf` file:
-
- ```ini
- wal_level = replica
- max_wal_senders = 10
- wal_keep_size = 500
- max_replication_slots = 10
- ```
-
- For more information on configuring parameters for streaming replication, see the [PostgreSQL documentation](https://www.postgresql.org/docs/13/warm-standby.html#STREAMING-REPLICATION).
-
- !!! Note
- The configuration parameters might differ for different versions of the database server. You can email EDB Support at [techsupport@enterprisedb.com](mailto:techsupport@enterprisedb.com) for help with setting up these parameters.
-
-1. Add the following entry in the host-based authentication (`/var/lib/edb/as13/data/pg_hba.conf`) file to allow the replication user to connect from all the standbys:
-
- ```shell
- hostssl replication repl 172.16.161.201/24 scram-sha-256
- ```
-
- !!! Note
- You can change the cidr range of the IP address, if needed.
-
-1. Modify the host-based authentication file (`/var/lib/edb/as13/data/pg_hba.conf`) for the pem_user role to connect to all databases using the scram-sha-256 authentication method:
-
- ```shell
- # Allow local PEM agents and admins to connect to PEM server
- hostssl all +pem_user 172.16.161.201/24 scram-sha-256
- hostssl pem +pem_user 127.0.0.1/32 scram-sha-256
- hostssl pem +pem_agent 127.0.0.1/32 cert
- # Allow remote PEM agents and users to connect to PEM server
- hostssl pem +pem_user 0.0.0.0/0 scram-sha-256
- hostssl pem +pem_agent 0.0.0.0/0 cert
- ```
-
-1. Restart the EDB Postgres Advanced Server 13 server.
-
- ```shell
- systemctl restart edb-as-13.service
- ```
-
-## Set up the standby nodes for streaming replication
-
-1. Stop the service for EDB Postgres Advanced Server 13 on all the standby nodes:
-
- ```shell
- $ systemctl stop edb-as-13.service
- ```
-
- !!! Note
- This example uses the pg_basebackup utility to create the replicas of the PEM backend database server on the standby servers. When using pg_basebackup, you need to stop the existing database server and remove the existing data directories.
-
-1. Remove the data directory of the database server on all the standby nodes:
-
- ```shell
- $ sudo su - enterprisedb
-
- $ rm -rf /var/lib/edb/as13/data/*
- ```
-
-1. Create the `.pgpass` file in the home directory of the enterprisedb user on all the standby nodes:
-
- ```shell
- $ sudo su - enterprisedb
-
- $ cat > ~/.pgpass << _EOF_
- 172.16.161.200:5444:replication:repl:CHANGE_ME
- 172.16.161.201:5444:replication:repl:CHANGE_ME
- 172.16.161.202:5444:replication:repl:CHANGE_ME
- _EOF_
-
- $ chmod 600 ~/.pgpass
- ```
-
-1. Take the backup of the primary node on each of the standby nodes using pg_basebackup:
-
- ```shell
- $ sudo su - enterprisedb /usr/edb/as13/bin/pg_basebackup -h 172.16.161.200 \
- -D /var/lib/edb/as13/data -U repl -v -P -Fp -R -p 5444
- ```
-
- The `backup` command creates the `postgresql.auto.conf` and `standby.signal` files on the standby nodes. The `postgresql.auto.conf` file has the following content:
-
- ```shell
- sudo su - enterprisedb cat /var/lib/edb/as13/data/postgresql.auto.conf
- # Do not edit this file manually
- # It will be overwritten by the ALTER SYSTEM command.
- primary_conninfo = ‘user=repl passfile=’’/var/lib/edb/.pgpass’’ channel_binding=prefer host=172.16.161.200 port=5444 sslmode=prefer sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsvrname=postgres target_session_attrs=any’
- ```
-
-1. In the `postgresql.conf` file on each of the standby nodes, edit the following parameter:
-
- ```ini
- hot_standby = on
- ```
-
-1. Start the EDB Postgres Advanced Server 13 database server on each of the standby nodes:
-
- ```shell
- $ systemctl enable edb-as-13
-
- $ systemctl start edb-as-13
- ```
-
-1. Copy the following files from the primary node to the standby nodes at the same location, overwriting any existing files. Set the permissions on the files:
-
- - `/etc/httpd/conf.d/edb-pem.conf`
- - `/etc/httpd/conf.d/edb-ssl-pem.conf`
- - `/root/.pem/agent1.crt`
- - `/root/.pem/agent1.key`
- - `/usr/edb/pem/agent/etc/agent.cfg`
- - `/usr/edb/pem/share/.install-config`
- - `/usr/edb/pem/web/pem.wsgi`
- - `/usr/edb/pem/web/config_setup.py`
-
-For example:
-
-```shell
- $ mkdir -p /root/.pem
- $ chown root:root /root/.pem
- $ chmod 0755 /root/.pem
- $ mkdir -p /var/lib/pemhome/.pem
- $ chown pem:pem /var/lib/pemhome/.pem
- $ chmod 0700 /var/lib/pemhome/.pem
- $ mkdir -p /usr/edb/pem/logs
- $ chown root:root /usr/edb/pem/logs
- $ chmod 0755 /usr/edb/pem/logs
- $ for file in /etc/httpd/conf.d/edb-pem.conf \
- /etc/httpd/conf.d/edb-ssl-pem.conf \
- /root/.pem/agent1.crt \
- /usr/edb/pem/agent/etc/agent.cfg \
- /usr/edb/pem/share/.install-config \
- /usr/edb/pem/web/pem.wsgi \
- /usr/edb/pem/web/config_setup.py; do \
- chown root:root ${file}; \
- chmod 0644 ${file}; \
- done;
- $ chmod 0600 /root/.pem/agent1.key
- $ chown root:root /root/.pem/agent1.key
-```
-
-This code ensures that the webserver is configured on the standby and is disabled by default. Switchover by EFM enables the webserver.
-
-!!! Note
- Manually keep the certificates in sync on master and standbys whenever the certificates are updated.
-
-1. Run the `configure-selinux.sh` script to configure the SELinux policy for PEM:
-
-```shell
- $ /usr/edb/pem/bin/configure-selinux.sh
- getenforce found, now executing 'getenforce' command
- Configure the httpd to work with the SELinux
- Allow the httpd to connect the database (httpd_can_network_connect_db = on)
- Allow the httpd to connect the network (httpd_can_network_connect = on)
- Allow the httpd to work with cgi (httpd_enable_cgi = on)
- Allow to read & write permission on the 'pem' user home directory
- SELinux policy is configured for PEM
-
- $ sudo chmod 640 /root/.pem/agent1.crt
-```
-
-1. If HTTPD and PEM agent services are running on all replica nodes, disable and stop them:
-
-```shell
-systemctl stop pemagent
-systemctl stop httpd
-systemctl disable pemagent
-systemctl disable httpd
-```
-
-!!! Note
- At this point, a PEM primary server and two standbys are ready to take over from the primary whenever needed.
-
-
-## Set up EFM to manage failover on all hosts
-
-1. Prepare the primary node to support EFM:
-
- - Create a database user efm to connect to the database servers.
- - Grant the execute privileges on the functions related to WAL logs and the monitoring privileges to the user.
- - Add entries in `pg_hba.conf` to allow the efm database user to connect to the database server from all nodes on all the hosts.
- - Reload the configurations on all the database servers.
-
- For example:
-
- ```sql
- $ cat > /tmp/efm-role.sql << _EOF_
- -- Create a role for EFM
- CREATE ROLE efm LOGIN PASSWORD 'password';
-
- -- Give privilege to 'efm' user to connect to a database
- GRANT CONNECT ON DATABASE edb TO efm;
-
- -- Give privilege to 'efm' user to do backup operations
- GRANT EXECUTE ON FUNCTION pg_current_wal_lsn() TO efm;
- GRANT EXECUTE ON FUNCTION pg_last_wal_replay_lsn() TO efm;
- GRANT EXECUTE ON FUNCTION pg_wal_replay_resume() TO efm;
- GRANT EXECUTE ON FUNCTION pg_wal_replay_pause() TO efm;
- GRANT EXECUTE ON FUNCTION pg_reload_conf() TO efm;
-
- -- Grant monitoring privilege to the 'efm' user
- GRANT pg_monitor TO efm;
- _EOF_
-
- $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -f /tmp/efm-role.sql
- CREATE ROLE
- GRANT
- GRANT
- GRANT
- GRANT
- GRANT
- GRANT
- GRANT ROLE
-
- $ rm -f /tmp/efm-role.sql
-
- $ cat > /var/lib/edb/as13/data/pg_hba.conf <<< _EOF_
- hostssl edb efm 172.16.161.200/32 scram-sha-256
- hostssl edb efm 172.16.161.201/32 scram-sha-256
- hostssl edb efm 172.16.161.202/32 scram-sha-256
- hostssl edb efm 172.16.161.203/32 scram-sha-256
- _EOF_
-
- $ /usr/edb/as13/bin/psql -h 172.16.161.200 -p 5444 -U enterprisedb edb -c “SELECT pg_reload_conf();”
- ```
-
-1. Create the scripts on each node to start/stop the PEM agent:
-
- ```shell
- $ sudo cat > /usr/local/bin/start-httpd-pemagent.sh << _EOF_
- #!/bin/sh
- /bin/sudo /bin/systemctl enable httpd
- /bin/sudo /bin/systemctl start httpd
- /bin/sudo /bin/systemctl enable pemagent
- /bin/sudo /bin/systemctl start pemagent
- _EOF_
- $ sudo cat > /usr/local/bin/stop-httpd-pemagent.sh << _EOF_
- #!/bin/sh
-
- /bin/sudo /bin/systemctl stop pemagent
- /bin/sudo /bin/systemctl disable pemagent
- /bin/sudo /bin/systemctl stop httpd
- /bin/sudo /bin/systemctl disable httpd
- _EOF_
- $ sudo chmod 770 /usr/local/bin/start-pemagent.sh
- $ sudo chmod 770 /usr/local/bin/stop-pemagent.sh
- ```
-
-1. Create a `sudoers` file (`/etc/sudoers.d/efm-pem`) on each node to allow the efm user to start/stop the pemagent:
-
- ```shell
- $ sudo cat > /etc/sudoers.d/efm-pem << _EOF_
- efm ALL=(ALL) NOPASSWD: /bin/systemctl enable pemagent
- efm ALL=(ALL) NOPASSWD: /bin/systemctl disable pemagent
- efm ALL=(ALL) NOPASSWD: /bin/systemctl stop pemagent
- efm ALL=(ALL) NOPASSWD: /bin/systemctl start pemagent
- efm ALL=(ALL) NOPASSWD: /bin/systemctl status pemagent
- _EOF_
- ```
-
-1. Create an `efm.nodes` file on all nodes using the sample file (`/etc/edb/efm-4.1/efm.nodes.in`), and give read-write access to the efm OS user:
-
- ```shell
- $ sudo cp /etc/edb/efm-4.1/efm.nodes.in /etc/edb/efm-4.1/efm.nodes
- $ sudo chown efm:efm /etc/edb/efm-4.1/efm.nodes
- $ sudo chmod 600 /etc/edb/efm-4.1/efm.nodes
- ```
-
-1. Add the IP address and efm port of the primary node in the `/etc/edb/efm-4.1/efm.nodes` file on the standby nodes:
-
- ```shell
- $ sudo cat > /etc/edb/efm-4.1/efm.nodes <<< _EOF_
- 172.16.161.200:7800
- _EOF_
- ```
-
-1. Create the `efm.properties` file on all the nodes using the sample file (`/etc/edb/efm-4.1/efm.properties.in`). Grant read access to all the users:
-
- ```shell
- $ sudo cp /etc/edb/efm-4.1/efm.properties.in /etc/edb/efm-4.1/efm.properties
- $ sudo chown efm:efm /etc/edb/efm-4.1/efm.properties
- $ sudo chmod a+r /etc/edb/efm-4.1/efm.properties
- ```
-
-1. Encrypt the efm user's password using the efm utility:
-
- ```shell
- $ export EFMPASS=password
- $ /usr/edb/efm-4.1/bin/efm encrypt efm --from-env
- 096666746b05b081d1a98e43d94c9dad
- ```
-
-1. Edit the following parameters in the properties file:
-
- ```ini
- db.user=efm
- db.password.encrypted=096666746b05b081d1a98e43d94c9dad
- db.port=5444
- db.database=edb
- db.service.owner=enterprisedb
- db.service.name=edb-as-13
- db.bin=/usr/edb/as13/bin
- db.data.dir=/var/lib/edb/as13/data
- jdbc.sslmode=require
- user.email=username@example.com
- from.email=node1@efm-pem
- notification.level=INFO
- notification.text.prefix=[PEM/EFM]
- bind.address=172.16.161.200:7800
- admin.port=7809
- is.witness=false
- local.period=10
- local.timeout=60
- local.timeout.final=10
- remote.timeout=10
- node.timeout=50
- encrypt.agent.messages=true
- stop.isolated.primary=true
- stop.failed.primary=true
- primary.shutdown.as.failure=false
- update.physical.slots.period=0
- ping.server.ip=8.8.8.8
- ping.server.command=/bin/ping -q -c3 -w5
- auto.allow.hosts=false
- stable.nodes.file=false
- db.reuse.connection.count=0
- auto.failover=true
- auto.reconfigure=true
- promotable=true
- use.replay.tiebreaker=true
- standby.restart.delay=0
- reconfigure.num.sync=false
- reconfigure.sync.primary=false
- minimum.standbys=0
- recovery.check.period=1
- restart.connection.timeout=60
- auto.resume.period=0
- virtual.ip=172.16.161.245
- virtual.ip.interface=ens33
- virtual.ip.prefix=24
- virtual.ip.single=true
- check.vip.before.promotion=true
- pgpool.enable=false
- sudo.command=sudo
- sudo.user.command=sudo -u %u
- syslog.host=localhost
- syslog.port=514
- syslog.protocol=UDP
- syslog.facility=LOCAL1
- file.log.enabled=true
- syslog.enabled=false
- jgroups.loglevel=INFO
- efm.loglevel=INFO
- jvm.options=-Xmx128m
- script.remote.post.promotion=/usr/local/bin/stop-pemagent.sh
- script.post.promotion=/usr/local/bin/start-pemagent.sh
- ```
-
-1. Set the value of the `is.witness` configuration parameter on the witness node to `true`:
-
- ```ini
- is.witness=true
- ```
-
-1. Enable and start the EFM service on the primary node:
-
- ```shell
- $ systemctl enable edb-efm-4.1
- $ systemctl start edb-efm-4.1
- ```
-
-1. Allow the standbys to join the cluster started on the primary node:
-
- ```shell
- /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.201
- /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.202
- /usr/edb/efm-4.1/bin/efm allow-node efm 172.16.161.203
- ```
-
-1. Enable and start the EFM service on the standby nodes and the EFM witness node:
-
- ```shell
- $ systemctl enable edb-efm-4.1
- $ systemctl start edb-efm-4.1
- ```
-
-11. Check the EFM cluster status from any node:
-
- ```shell
- $ sudo /usr/edb/efm-4.1/bin/efm cluster-status efm
- Cluster Status: efm
- Agent Type Address DB VIP
- ----------------------------------------------------------------
- Primary 172.16.161.200 UP 172.16.161.245*
- Standby 172.16.161.201 UP 172.16.161.245
- Standby 172.16.161.202 UP 172.16.161.245
- Witness 172.16.161.203 N/A 172.16.161.245
-
- Allowed node host list:
- 172.16.161.200 172.16.161.201 172.16.161.202 172.16.161.203
-
- Membership coordinator: 172.16.161.200
-
- Standby priority host list:
- 172.16.161.201 172.16.161.202
-
- Promote Status:
-
- DB Type Address WAL Received LSN WAL Replayed LSN Info
- ---------------------------------------------------------------------------
- Primary 172.16.161.200 0/F7A3808
- Standby 172.16.161.201 0/F7A3808 0/F7A3808
- Standby 172.16.161.202 0/F7A3808 0/F7A3808
-
- Standby database(s) in sync with primary. It is safe to promote.
- ```
-
-This status confirms that EFM is set up successfully and managing the failover for the PEM server.
-
-In case of failover, any of the standbys are promoted as the primary node, and PEM agents connect to the new primary node. You can replace the failed primary node with a new standby using this procedure.
-
-## Current limitations
-
-The current limitations include:
-- Web console sessions for the users are lost during the switchover.
-- Per-user settings set from the Preferences dialog box are lost, as they’re stored in local configuration files on the file system.
-- Background processes started by the Backup, Restore, and Maintenance dialogs boxes and their logs aren't shared between the systems. They're lost during switchover.
From 376ad8d307b6464ac5b314f1bb6dc825d40f21a1 Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Wed, 14 May 2025 17:53:18 +0100
Subject: [PATCH 38/42] Release Notes for 4.1.0 stubbed
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator_4.1.0_rel_notes.mdx | 23 +++++++++++++++++++
.../ai-accelerator/rel_notes/index.mdx | 2 ++
.../rel_notes/src/rel_notes_4.1.0.yml | 17 ++++++++++++++
3 files changed, 42 insertions(+)
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
create mode 100644 advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
new file mode 100644
index 00000000000..cad6f23262f
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -0,0 +1,23 @@
+---
+title: AI Accelerator - Pipelines 4.1.0 release notes
+navTitle: Version 4.1.0
+originalFilePath: advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+editTarget: originalFilePath
+---
+
+Released: 19 May 2025
+
+This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+
+## Highlights
+
+- MOAR AI
+
+## Enhancements
+
+Description | Addresses |
+Placeholder for future release note.
Soon.
+ | |
+
+
+
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
index dbc87bf6dfd..a46870bd873 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/index.mdx
@@ -4,6 +4,7 @@ navTitle: Release notes
description: Release notes for EDB Postgres AI - AI Accelerator
indexCards: none
navigation:
+ - ai-accelerator_4.1.0_rel_notes
- ai-accelerator_4.0.1_rel_notes
- ai-accelerator_4.0.0_rel_notes
- ai-accelerator_3.0.1_rel_notes
@@ -22,6 +23,7 @@ The EDB Postgres AI - AI Accelerator describes the latest version of AI Accelera
| AI Accelerator version | Release Date |
|---|---|
+| [4.1.0](./ai-accelerator_4.1.0_rel_notes) | 19 May 2025 |
| [4.0.1](./ai-accelerator_4.0.1_rel_notes) | 09 May 2025 |
| [4.0.0](./ai-accelerator_4.0.0_rel_notes) | 05 May 2025 |
| [3.0.1](./ai-accelerator_3.0.1_rel_notes) | 03 Apr 2025 |
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
new file mode 100644
index 00000000000..d7c8eebe66a
--- /dev/null
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -0,0 +1,17 @@
+# yaml-language-server: $schema=https://raw.githubusercontent.com/EnterpriseDB/docs/refs/heads/develop/tools/automation/generators/relgen/relnote-schema.json
+product: AI Accelerator - Pipelines
+version: 4.1.0
+date: 19 May 2025
+intro: |
+ This is a minor release that includes a few bug fixes and enhancements to the knowledge base pipeline.
+highlights: |
+ - MOAR AI
+relnotes:
+- relnote: Placeholder for future release note.
+ details: |
+ Soon.
+ jira: ""
+ addresses: ""
+ type: Enhancement
+ impact: Medium
+
From 70cec1fb0dc49296a134b890689cdf75fef0e2bc Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Thu, 15 May 2025 11:26:07 +0100
Subject: [PATCH 39/42] Remove New from front page
Signed-off-by: Dj Walker-Morgan
---
src/pages/index.js | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/pages/index.js b/src/pages/index.js
index 4b30882af28..0b455c4443e 100644
--- a/src/pages/index.js
+++ b/src/pages/index.js
@@ -282,7 +282,7 @@ const Page = () => {
Get Started with Pipelines
- New: AI Accelerator Preparers
+ AI Accelerator Preparers
PGvector
From e6fb7cf0e856b076b5c24bc9deb9cf8e922698e6 Mon Sep 17 00:00:00 2001
From: "github-actions[bot]"
<41898282+github-actions[bot]@users.noreply.github.com>
Date: Mon, 19 May 2025 09:15:26 +0000
Subject: [PATCH 40/42] update generated release notes
---
.../ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 1 -
1 file changed, 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
index 625b04b6b59..51174a0fb6e 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -14,7 +14,6 @@ This is a minor release that includes enhancements to the preparer pipeline and
- Automatic unnesting of Preparer results for operations that transform the shape of data.
- Batch processing for embeddings with external models.
-
## Enhancements
Description | Addresses |
From 4b78f4b1f16befda3de2c6f924ed308114d71f4c Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Mon, 19 May 2025 15:26:43 +0100
Subject: [PATCH 41/42] Fix bad link
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator/capabilities/auto-processing.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
index d5ec8c3f827..947dead787b 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/capabilities/auto-processing.mdx
@@ -127,7 +127,7 @@ In Background and Disabled modes, (auto) processing happens in batches of config
All records within each batch are processed in parallel wherever possible. This means pipeline steps like data retrieval, embeddings computation, and storing embeddings will run as parallel operations.
E.g., when using a table as a data source, a batch of input records will be retrieved with a single query. With a volume source, concurrent requests will be used to retrieve a batch of records.
-Our [knowledge base pipeline performance tuning guide](knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput.
+Our [knowledge base pipeline performance tuning guide](../knowledge_base/performance_tuning) explains how the batch size can be tuned for optimal throughput.
## Change detection
AIDB auto-processing is designed around change detection mechanisms for table and volume data sources. This allows it to only
From fa886146651d5235009685bca723811c20fc1e2e Mon Sep 17 00:00:00 2001
From: Dj Walker-Morgan
Date: Mon, 19 May 2025 15:35:30 +0100
Subject: [PATCH 42/42] fix rel notes typo
Signed-off-by: Dj Walker-Morgan
---
.../ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx | 2 +-
.../ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
index 51174a0fb6e..3590058bdbe 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/ai-accelerator_4.1.0_rel_notes.mdx
@@ -19,7 +19,7 @@ This is a minor release that includes enhancements to the preparer pipeline and
Description | Addresses |
Automatic unnesting of Preparer results for operations that transform the shape of data.
The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
-Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to unqiuely identify the combination of the source key and part_id.
+Unnested results are returned with a new part_id column to track the new dimension. There is also a new unique_id column to uniquely identify the combination of the source key and part_id.
| |
Batch processing for embeddings with external models.
The external model providers embeddings , openai_embeddings , and nim_embeddings can now send a batch of inputs in a single request, rather than multiple concurrent requests.
This can improve performance and hardware utilization. The feature is fully configurable and can also be disabled.
diff --git a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
index be37e6ba9ff..edbcf329ba0 100644
--- a/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
+++ b/advocacy_docs/edb-postgres-ai/ai-accelerator/rel_notes/src/rel_notes_4.1.0.yml
@@ -12,7 +12,7 @@ relnotes:
details: |
The preparer pipeline for operations that transform the shape of their input data with an additional dimension now unnest their result collections.
This allows the output of preparers to be consumed much more easily by other preparers or knowledge bases.
- Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to unqiuely identify the combination of the source key and part_id.
+ Unnested results are returned with a new `part_id` column to track the new dimension. There is also a new `unique_id` column to uniquely identify the combination of the source key and part_id.
jira: "AID-410"
addresses: ""
type: Enhancement
|
|