|
| 1 | +--- |
| 2 | +title: Deployment & Operations Skills Taxonomy |
| 3 | +summary: Learn the foundational skills required to deploy and operate CockroachDB |
| 4 | +toc: true |
| 5 | +docs_area: deploy |
| 6 | +--- |
| 7 | + |
| 8 | +This document outlines the foundational skills required to deploy and operate CockroachDB in production environments. |
| 9 | + |
| 10 | +The skills are organized into sections based on the following operational domains: |
| 11 | + |
| 12 | +- [Infrastructure configuration](#infrastructure-configuration) |
| 13 | +- [Security](#security) |
| 14 | +- [Cluster maintenance](#cluster-maintenance) |
| 15 | +- [Troubleshooting](#troubleshooting) |
| 16 | +- [Disaster recovery](#disaster-recovery) |
| 17 | + |
| 18 | +Each section includes links to relevant documentation for the listed skills. |
| 19 | + |
| 20 | +{{site.data.alerts.callout_success}} |
| 21 | +Cockroach Labs offers [Professional Services](https://www.cockroachlabs.com/company/professional-services/) that can assist you with getting applications into production faster and more efficiently. |
| 22 | +{{site.data.alerts.end}} |
| 23 | + |
| 24 | +## Infrastructure configuration |
| 25 | + |
| 26 | +This section covers how to ensure that your hardware and network are properly configured to meet the performance and connectivity requirements of CockroachDB. |
| 27 | + |
| 28 | +- [Verify vCPU, RAM, storage, and disk IOPS performance]({% link {{ page.version.version }}/recommended-production-settings.md %}#hardware) |
| 29 | +- [Configure time synchronization with NTP server]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}#step-1-synchronize-clocks) |
| 30 | +- [Validate network connectivity]({% link {{ page.version.version }}/known-limitations.md %}#cockroachdb-does-not-test-for-all-connection-failure-scenarios) |
| 31 | + |
| 32 | +## Security |
| 33 | + |
| 34 | +This section covers how to secure a CockroachDB deployment, including certificate management, load balancing setup, role-based access control, and data encryption. |
| 35 | + |
| 36 | +- [Create and distribute certificates; initialize cluster]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}#step-2-generate-certificates) |
| 37 | +- [Configure load balancer and direct a workload]({% link {{ page.version.version }}/deploy-cockroachdb-on-premises.md %}#step-6-set-up-load-balancing) |
| 38 | +- [Configure RBAC]({% link {{ page.version.version }}/security-reference/authorization.md %}) |
| 39 | +- [Encryption at rest]({% link {{ page.version.version }}/encryption.md %}) |
| 40 | + |
| 41 | +## Cluster maintenance |
| 42 | + |
| 43 | +This section covers how to manage the lifecycle of CockroachDB nodes, including adding and removing nodes, handling outages, performing upgrades or downgrades, and modifying cluster settings. |
| 44 | + |
| 45 | +- [Shut down a node gracefully]({% link {{ page.version.version }}/node-shutdown.md %}) |
| 46 | +- [Handle unplanned node outages]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing) |
| 47 | +- [Add nodes]({% link {{ page.version.version }}/cockroach-start.md %}#add-a-node-to-a-cluster) |
| 48 | +- [Remove nodes]({% link {{ page.version.version }}/node-shutdown.md %}?filters=decommission#remove-nodes) |
| 49 | +- [Add a region]({% link {{ page.version.version }}/alter-database.md %}#add-regions-to-a-database) |
| 50 | +- [Remove a region]({% link {{ page.version.version }}/alter-database.md %}#drop-region) |
| 51 | +- [Rolling upgrades]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}) |
| 52 | +- Downgrade a cluster from a [patch or major version]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#step-5-roll-back-the-upgrade-optional) |
| 53 | +- [Change a cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#change-a-cluster-setting) |
| 54 | +- Repave a cluster: cluster repaving involves the following individual skills, which are also used during [rolling upgrades]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}): |
| 55 | + 1. [Shut down a node gracefully]({% link {{ page.version.version }}/node-shutdown.md %}) |
| 56 | + 1. Detach the [persistent volume]({% link {{ page.version.version }}/kubernetes-overview.md %}#kubernetes-terminology) (a.k.a. persistent disk) from the removed node's virtual machine (VM) (this step is optional but recommended) |
| 57 | + 1. Delete the removed node's VM |
| 58 | + 1. Start a new VM |
| 59 | + 1. Reattach the persistent disk to the new VM (necessary if you did step #2) |
| 60 | + 1. [Add a node to the cluster]({% link {{ page.version.version }}/cockroach-start.md %}#add-a-node-to-a-cluster) from the new VM |
| 61 | + |
| 62 | +## Troubleshooting |
| 63 | + |
| 64 | +This section contains a list of common issues related to SQL performance, cluster stability, memory usage, load balancing, and changefeed lag. |
| 65 | + |
| 66 | +- [SQL response time for specific queries]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#query-issues) |
| 67 | +- [SQL throughput degradation across the board]({% link {{ page.version.version }}/query-behavior-troubleshooting.md %}#low-throughput) |
| 68 | +- [Cluster instability: Dead/suspect nodes]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues) |
| 69 | +- [Out of memory (OOM) problems]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#out-of-memory-oom-crash) |
| 70 | +- [Imbalanced cluster load]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing) |
| 71 | +- [End of file (EOF) errors]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#client-connection-issues) |
| 72 | +- [Changefeed is falling behind]({% link {{ page.version.version }}/advanced-changefeed-configuration.md %}#lagging-ranges) |
| 73 | +- [Gather diagnostic data from a "debug zip" file]({% link {{ page.version.version }}/cockroach-debug-zip.md %}) |
| 74 | +- [Collect timeseries diagnostic data from a "tsdump" file]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %}) |
| 75 | + |
| 76 | +## Disaster recovery |
| 77 | + |
| 78 | +This section covers how to set up and manage backup and restore of your cluster to ensure data recovery in case of failures. |
| 79 | + |
| 80 | +- [Create AWS IAM access key]({% link {{ page.version.version }}/cloud-storage-authentication.md %}) |
| 81 | +- [Create S3 bucket for backup data]({% link {{ page.version.version }}/use-cloud-storage.md %}#amazon-s3-storage-classes) |
| 82 | +- [Full cluster backup to S3]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#full-backups) |
| 83 | +- [Incremental backup to S3]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#incremental-backups) |
| 84 | +- [Cluster restore from AWS S3]({% link {{ page.version.version }}/restore.md %}#restore-a-cluster) |
| 85 | + |
| 86 | +## See also |
| 87 | + |
| 88 | +- [Production Checklist]({% link {{ page.version.version }}/recommended-production-settings.md %}) |
| 89 | +- [Manual Deployment]({% link {{ page.version.version }}/manual-deployment.md %}) |
| 90 | +- [Deploy a Local Cluster from Binary (Secure)]({% link {{ page.version.version }}/secure-a-cluster.md %}) |
| 91 | +- [SQL Performance Best Practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}) |
| 92 | +- [Performance Tuning Recipes]({% link {{ page.version.version }}/performance-recipes.md %}) |
| 93 | +- [Troubleshoot Self-Hosted Setup]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}) |
0 commit comments