DOCSP-33391 Sharded Backup with Filesystem Snapshot (#5553) (#5949)

kennethdyer · mdb-ashley · web-flow · commit 06e7bb9083fc · 2024-01-24T12:50:27.000-05:00
* DOCSP-33391 Fixes filesystem snapshot text * Adds step to find a backup window * Reworks procedure for filesystem snapshot * Refactors filesystem backup * removes deprecated YAML * fixes build error * fixes build error * Fixes per Ian * Fixes per Ashley * Fixes per Ashley * Fixes per Ashley * Fixes build issues * Fixes per Nandini * Fixes per Nandini * Fixes spacing issue * Vale checks --------- Co-authored-by: Ashley Brown <98361885+mdb-ashley@users.noreply.github.com>
diff --git a/source/includes/note-shard-cluster-backup.rst b/source/includes/note-shard-cluster-backup.rst
@@ -1,4 +1,3 @@
-.. important:: 
+.. important::
 
-   To capture a consistent backup from a sharded
-   cluster you **must** stop *all* writes to the cluster. 
+   To back up a sharded cluster you **must** stop *all* writes to the cluster.
diff --git a/source/includes/sharded-clusters-backup-restore-file-system-snapshot-restriction.rst b/source/includes/sharded-clusters-backup-restore-file-system-snapshot-restriction.rst
@@ -1,7 +1,8 @@
-In MongoDB 4.2+, you cannot use :doc:`file system snapshots
-</tutorial/backup-with-filesystem-snapshots>` for backups that involve
-transactions across shards because those backups do not maintain
-atomicity. Instead, use one of the following to perform the backups:
+To take a backup with a file system snapshot, you must first stop the balancer,
+stop writes, and stop any schema transformation operations on the cluster.
+
+MongoDB provides backup and restore operations that can run with the balancer
+and running transactions through the following services:
 
 - `MongoDB Atlas <https://docs.atlas.mongodb.com/>`_
 
diff --git a/source/tutorial/backup-sharded-cluster-with-filesystem-snapshots.txt b/source/tutorial/backup-sharded-cluster-with-filesystem-snapshots.txt
@@ -6,15 +6,12 @@ Back Up a Sharded Cluster with File System Snapshots
 
 .. default-domain:: mongodb
 
-
-
 .. contents:: On this page
    :local:
    :backlinks: none
    :depth: 1
    :class: singlecol
 
-
 Overview
 --------
 
@@ -40,15 +37,15 @@ Encrypted Storage Engine (MongoDB Enterprise Only)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. include:: /includes/fact-aes256-backups.rst
-   
+
 Balancer
 ~~~~~~~~
 
 It is *essential* that you stop the :ref:`balancer
 <sharding-internals-balancing>` before capturing a backup.
 
 If the balancer is active while you capture backups, the backup
-artifacts may be incomplete and/or have duplicate data, as :term:`chunks
+artifacts may be incomplete or have duplicate data, as :term:`chunks
 <chunk>` may migrate while recording backups.
 
 Precision
@@ -58,28 +55,191 @@ In this procedure, you will stop the cluster balancer and take a backup
 up of the :term:`config database`, and then take backups of each
 shard in the cluster using a file-system snapshot tool. If you need an
 exact moment-in-time snapshot of the system, you will need to stop all
-application writes before taking the file system snapshots; otherwise
-the snapshot will only approximate a moment in time.
-
-For approximate point-in-time snapshots, you can minimize the impact on
-the cluster by taking the backup from a secondary member of each
-replica set shard.
+writes before taking the file system snapshots; otherwise the snapshot will
+only approximate a moment in time.
 
 Consistency
 ~~~~~~~~~~~
 
-If the journal and data files are on the same logical volume, you can
-use a single point-in-time snapshot to capture a consistent copy of the
-data files.
-
-If the journal and data files are on different file systems, you must
-use :method:`db.fsyncLock()` and :method:`db.fsyncUnlock()` to ensure
-that the data files do not change, providing consistency for the
-purposes of creating backups.
+To back up a sharded cluster, you must use the :dbcommand:`fsync` command or
+:method:`db.fsyncLock` method to stop writes on the cluster. This ensures that
+data files do not change during the backup.
 
 .. include:: /includes/fact-backup-snapshots-with-ebs-in-raid10.rst
 
-Procedure
----------
+Steps
+-----
+
+To take a self-managed backup of a sharded cluster, complete the following
+steps:
+
+.. procedure::
+   :style: normal
+
+   .. step:: Find a Backup Window
+
+      Chunk migrations, resharding, and schema migration operations can cause
+      inconsistencies in backups. To find a good time to perform a backup,
+      monitor your application and database usage and find a time when these
+      operations are unlikely to occur.
+
+      For more information, see :ref:`sharded-schedule-backup`.
+
+   .. step:: Stop the Balancer
+
+      To prevent chunk migrations from disrupting the backup, use
+      the :method:`sh.stopBalancer` method to stop the balancer:
+
+      .. code-block:: javascript
+
+         sh.stopBalancer()
+
+      If a balancing round is currently in progress, the operation waits for
+      balancing to complete.
+
+      To confirm that the balancer is stopped, use the
+      :method:`sh.getBalancerState` method:
+
+      .. io-code-block::
+
+         .. input::
+            :language: javascript
+
+            sh.getBalancerState()
+
+         .. output::
+            :language: javascript
+
+            false
+
+      The command returns ``false`` when the balancer is stopped.
+
+   .. step:: Lock the Cluster
+
+      Writes to the database can cause backup inconsistencies. Lock your
+      sharded cluster to protect the database from writes.
+
+      To lock a sharded cluster,  use the :method:`db.fsyncLock` method:
+
+      .. code-block:: javascript
+
+         db.getSiblingDB("admin").fsyncLock()
+
+      Run the following aggregation pipeline on both :program:`mongos` and
+      the primary :program:`mongod` of the config servers. To confirm the
+      lock, ensure that the ``fysncLocked`` field returns ``true`` and
+      ``fsyncUnlocked`` field returns ``false``.
+
+      .. io-code-block::
+
+         .. input::
+            :language: javascript
+
+            db.getSiblingDB("admin").aggregate( [
+               {  $currentOp: { } },
+               {  $facet: {
+                  "locked": [
+                     { $match: { $and: [
+                        { fsyncLock: { $exists: true } },
+                        { fsyncLock: true }
+                     ] } }],
+                   "unlocked": [
+                     { $match: { fsyncLock: { $exists: false } } }
+                   ]
+               } },
+               {  $project: {
+                  "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
+                  "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
+               } }
+             ] )
+
+         .. output::
+            :language: json
+
+            [ { fsyncLocked: true }, { fsyncUnlocked: false } ]
+
+   .. step:: Back up the Primary Config Server
+
+      .. note::
+
+         Backing up a :ref:`config server <sharding-config-server>` backs
+         up the sharded cluster's metadata. You only need to back up one
+         config server, as they all hold the same data. Perform this step
+         against the CSRS primary member.
+
+      To create a filesystem snapshot of the config server, follow the
+      procedure in :ref:`lvm-backup-operation`.
+
+   .. step:: Back up the Primary Shards
+
+      Perform a filesystem snapshot against the primary member of each shard,
+      using the procedure found in :ref:`backup-restore-filesystem-snapshots`.
+
+   .. step:: Unlock the Cluster
+
+      After the backup completes, you can unlock the cluster to allow writes
+      to resume.
+
+      To unlock the cluster, use the :method:`db.fsyncUnlock` method:
+
+      .. code-block:: bash
+
+         db.getSibling("admin").fsyncUnlock()
+
+      Run the following aggregation pipeline on both :program:`mongos` and
+      the primary :program:`mongod` of the config servers. To confirm the
+      unlock, ensure that the ``fysncLocked`` field returns ``false`` and
+      ``fsyncUnlocked`` field returns ``true``.
+
+      .. io-code-block::
+
+         .. input::
+            :language: javascript
+
+            db.getSiblingDB("admin").aggregate( [
+               {  $currentOp: { } },
+               {  $facet: {
+                  "locked": [
+                     { $match: { $and: [
+                        { fsyncLock: { $exists: true } },
+                        { fsyncLock: true }
+                     ] } }],
+                   "unlocked": [
+                     { $match: { fsyncLock: { $exists: false } } }
+                   ]
+               } },
+               { $project: {
+                  "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
+                  "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
+               } }
+             ] )
+
+         .. output::
+            :language: json
+
+            [ { fsyncLocked: false }, { fsyncUnlocked: true } ]
+
+   .. step:: Restart the Balancer
+
+      To restart the balancer, use the :method:`sh.startBalancer` method:
+
+      .. code-block:: javascript
+
+         sh.startBalancer()
+
+      To confirm that the balancer is running, use the
+      :method:`sh.getBalancerState` method:
+
+      .. io-code-block::
+
+         .. input::
+            :language: javascript
+
+            sh.getBalancerState()
+
+         .. output::
+            :language: javascript
+
+            true
 
-.. include:: /includes/steps/backup-sharded-cluster-with-snapshots.rst
+      The command returns ``true`` when the balancer is running.