From e6962b42c312d12c77c565b4de7865f883a017a1 Mon Sep 17 00:00:00 2001 From: Jyoti Verma Date: Sat, 7 Sep 2024 03:25:34 +0000 Subject: [PATCH 1/2] added known issues for sharding --- docs/sharding/README.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/sharding/README.md b/docs/sharding/README.md index a5c7c470..fd02fbe6 100644 --- a/docs/sharding/README.md +++ b/docs/sharding/README.md @@ -33,6 +33,7 @@ Following sections provide the details for deploying Oracle Globally Distributed * [Provisioning Oracle Globally Distributed Database System-Managed Sharding with Raft replication enabled in a Cloud-Based Kubernetes Cluster](#provisioning-oracle-globally-distributed-database-topology-with-system-managed-sharding-and-raft-replication-enabled-in-a-cloud-based-kubernetes-cluster) * [Connecting to Shard Databases](#connecting-to-shard-databases) * [Debugging and Troubleshooting](#debugging-and-troubleshooting) +* [Known Issues](#known-issues) **Note** Before proceeding to the next section, you must complete the instructions given in each section, based on your enviornment, before proceeding to next section. @@ -187,3 +188,10 @@ After the Oracle Globally Distributed Database Topology has been provisioned usi ## Debugging and Troubleshooting To debug the Oracle Globally Distributed Database Topology provisioned using the Sharding Controller of Oracle Database Kubernetes Operator, follow this document: [Debugging and troubleshooting](./provisioning/debugging.md) + +## Known Issues + +* Issue 1: For both ENTERPRISE and FREE Images, if the GSM POD is stopped using "crictl stopp" at the worker node level, it leaves GSM in failed state with the "gdsctl" commands failing with error GSM-45034: Connection to GDS catalog is not established". It is beacause with change, the network namespace is lost if we check from the GSM Pod. +* Issue 2: For both ENTERPRISE and FREE Images, reboot of node running CATALOG using "/sbin/reboot -f" results in "GSM-45076: GSM IS NOT RUNNING". Once you hit this issue, after waiting for a certain time, the "gdsctl" commands start working as the DB connection start working. Once the stack comes up fine after the node reboot, after some time, unexpected restart of GSM Pod is also observed. +* Issue 3: For both ENTERPRISE and FREE Images, reboot of node running the SHARD Pod using "/sbin/reboot -f" or stopping the Shard Database Pod from worker node using "crictl stopp" command leaves the shard in error state. +* Issue 4: For both ENTERPRISE and FREE Images, GSM pod restarts multiple times after force rebooting the node running GSM Pod. Its because when the worker node comes up, the GSM pod was recreated but it does not get DB connection to Catalog and meanwhile, the Liveness Probe fails which restart the Pod. \ No newline at end of file From dd79013d308539e269b6a39af304d406227d7e30 Mon Sep 17 00:00:00 2001 From: Jyoti Verma Date: Tue, 10 Sep 2024 20:05:04 +0000 Subject: [PATCH 2/2] added the known issue change --- docs/sharding/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sharding/README.md b/docs/sharding/README.md index fd02fbe6..076d7e32 100644 --- a/docs/sharding/README.md +++ b/docs/sharding/README.md @@ -191,7 +191,7 @@ To debug the Oracle Globally Distributed Database Topology provisioned using the ## Known Issues -* Issue 1: For both ENTERPRISE and FREE Images, if the GSM POD is stopped using "crictl stopp" at the worker node level, it leaves GSM in failed state with the "gdsctl" commands failing with error GSM-45034: Connection to GDS catalog is not established". It is beacause with change, the network namespace is lost if we check from the GSM Pod. -* Issue 2: For both ENTERPRISE and FREE Images, reboot of node running CATALOG using "/sbin/reboot -f" results in "GSM-45076: GSM IS NOT RUNNING". Once you hit this issue, after waiting for a certain time, the "gdsctl" commands start working as the DB connection start working. Once the stack comes up fine after the node reboot, after some time, unexpected restart of GSM Pod is also observed. -* Issue 3: For both ENTERPRISE and FREE Images, reboot of node running the SHARD Pod using "/sbin/reboot -f" or stopping the Shard Database Pod from worker node using "crictl stopp" command leaves the shard in error state. -* Issue 4: For both ENTERPRISE and FREE Images, GSM pod restarts multiple times after force rebooting the node running GSM Pod. Its because when the worker node comes up, the GSM pod was recreated but it does not get DB connection to Catalog and meanwhile, the Liveness Probe fails which restart the Pod. \ No newline at end of file +* For both ENTERPRISE and FREE Images, if the GSM POD is stopped using `crictl stopp` at the worker node level, it leaves GSM in failed state with the `gdsctl` commands failing with error **GSM-45034: Connection to GDS catalog is not established**. It is beacause with change, the network namespace is lost if we check from the GSM Pod. +* For both ENTERPRISE and FREE Images, reboot of node running CATALOG using `/sbin/reboot -f` results in **GSM-45076: GSM IS NOT RUNNING**. Once you hit this issue, after waiting for a certain time, the `gdsctl` commands start working as the DB connection start working. Once the stack comes up fine after the node reboot, after some time, unexpected restart of GSM Pod is also observed. +* For both ENTERPRISE and FREE Images, reboot of node running the SHARD Pod using `/sbin/reboot -f` or stopping the Shard Database Pod from worker node using `crictl stopp` command leaves the shard in error state. +* For both ENTERPRISE and FREE Images, GSM pod restarts multiple times after force rebooting the node running GSM Pod. Its because when the worker node comes up, the GSM pod was recreated but it does not get DB connection to Catalog and meanwhile, the Liveness Probe fails which restart the Pod. \ No newline at end of file