Skip to content

Commit 06e7bb9

Browse files
DOCSP-33391 Sharded Backup with Filesystem Snapshot (#5553) (#5949)
* DOCSP-33391 Fixes filesystem snapshot text * Adds step to find a backup window * Reworks procedure for filesystem snapshot * Refactors filesystem backup * removes deprecated YAML * fixes build error * fixes build error * Fixes per Ian * Fixes per Ashley * Fixes per Ashley * Fixes per Ashley * Fixes build issues * Fixes per Nandini * Fixes per Nandini * Fixes spacing issue * Vale checks --------- Co-authored-by: Ashley Brown <[email protected]>
1 parent ca8a7a9 commit 06e7bb9

File tree

3 files changed

+189
-29
lines changed

3 files changed

+189
-29
lines changed
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
.. important::
1+
.. important::
22

3-
To capture a consistent backup from a sharded
4-
cluster you **must** stop *all* writes to the cluster.
3+
To back up a sharded cluster you **must** stop *all* writes to the cluster.

source/includes/sharded-clusters-backup-restore-file-system-snapshot-restriction.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
1-
In MongoDB 4.2+, you cannot use :doc:`file system snapshots
2-
</tutorial/backup-with-filesystem-snapshots>` for backups that involve
3-
transactions across shards because those backups do not maintain
4-
atomicity. Instead, use one of the following to perform the backups:
1+
To take a backup with a file system snapshot, you must first stop the balancer,
2+
stop writes, and stop any schema transformation operations on the cluster.
3+
4+
MongoDB provides backup and restore operations that can run with the balancer
5+
and running transactions through the following services:
56

67
- `MongoDB Atlas <https://docs.atlas.mongodb.com/>`_
78

source/tutorial/backup-sharded-cluster-with-filesystem-snapshots.txt

Lines changed: 182 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,12 @@ Back Up a Sharded Cluster with File System Snapshots
66

77
.. default-domain:: mongodb
88

9-
10-
119
.. contents:: On this page
1210
:local:
1311
:backlinks: none
1412
:depth: 1
1513
:class: singlecol
1614

17-
1815
Overview
1916
--------
2017

@@ -40,15 +37,15 @@ Encrypted Storage Engine (MongoDB Enterprise Only)
4037
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4138

4239
.. include:: /includes/fact-aes256-backups.rst
43-
40+
4441
Balancer
4542
~~~~~~~~
4643

4744
It is *essential* that you stop the :ref:`balancer
4845
<sharding-internals-balancing>` before capturing a backup.
4946

5047
If the balancer is active while you capture backups, the backup
51-
artifacts may be incomplete and/or have duplicate data, as :term:`chunks
48+
artifacts may be incomplete or have duplicate data, as :term:`chunks
5249
<chunk>` may migrate while recording backups.
5350

5451
Precision
@@ -58,28 +55,191 @@ In this procedure, you will stop the cluster balancer and take a backup
5855
up of the :term:`config database`, and then take backups of each
5956
shard in the cluster using a file-system snapshot tool. If you need an
6057
exact moment-in-time snapshot of the system, you will need to stop all
61-
application writes before taking the file system snapshots; otherwise
62-
the snapshot will only approximate a moment in time.
63-
64-
For approximate point-in-time snapshots, you can minimize the impact on
65-
the cluster by taking the backup from a secondary member of each
66-
replica set shard.
58+
writes before taking the file system snapshots; otherwise the snapshot will
59+
only approximate a moment in time.
6760

6861
Consistency
6962
~~~~~~~~~~~
7063

71-
If the journal and data files are on the same logical volume, you can
72-
use a single point-in-time snapshot to capture a consistent copy of the
73-
data files.
74-
75-
If the journal and data files are on different file systems, you must
76-
use :method:`db.fsyncLock()` and :method:`db.fsyncUnlock()` to ensure
77-
that the data files do not change, providing consistency for the
78-
purposes of creating backups.
64+
To back up a sharded cluster, you must use the :dbcommand:`fsync` command or
65+
:method:`db.fsyncLock` method to stop writes on the cluster. This ensures that
66+
data files do not change during the backup.
7967

8068
.. include:: /includes/fact-backup-snapshots-with-ebs-in-raid10.rst
8169

82-
Procedure
83-
---------
70+
Steps
71+
-----
72+
73+
To take a self-managed backup of a sharded cluster, complete the following
74+
steps:
75+
76+
.. procedure::
77+
:style: normal
78+
79+
.. step:: Find a Backup Window
80+
81+
Chunk migrations, resharding, and schema migration operations can cause
82+
inconsistencies in backups. To find a good time to perform a backup,
83+
monitor your application and database usage and find a time when these
84+
operations are unlikely to occur.
85+
86+
For more information, see :ref:`sharded-schedule-backup`.
87+
88+
.. step:: Stop the Balancer
89+
90+
To prevent chunk migrations from disrupting the backup, use
91+
the :method:`sh.stopBalancer` method to stop the balancer:
92+
93+
.. code-block:: javascript
94+
95+
sh.stopBalancer()
96+
97+
If a balancing round is currently in progress, the operation waits for
98+
balancing to complete.
99+
100+
To confirm that the balancer is stopped, use the
101+
:method:`sh.getBalancerState` method:
102+
103+
.. io-code-block::
104+
105+
.. input::
106+
:language: javascript
107+
108+
sh.getBalancerState()
109+
110+
.. output::
111+
:language: javascript
112+
113+
false
114+
115+
The command returns ``false`` when the balancer is stopped.
116+
117+
.. step:: Lock the Cluster
118+
119+
Writes to the database can cause backup inconsistencies. Lock your
120+
sharded cluster to protect the database from writes.
121+
122+
To lock a sharded cluster, use the :method:`db.fsyncLock` method:
123+
124+
.. code-block:: javascript
125+
126+
db.getSiblingDB("admin").fsyncLock()
127+
128+
Run the following aggregation pipeline on both :program:`mongos` and
129+
the primary :program:`mongod` of the config servers. To confirm the
130+
lock, ensure that the ``fysncLocked`` field returns ``true`` and
131+
``fsyncUnlocked`` field returns ``false``.
132+
133+
.. io-code-block::
134+
135+
.. input::
136+
:language: javascript
137+
138+
db.getSiblingDB("admin").aggregate( [
139+
{ $currentOp: { } },
140+
{ $facet: {
141+
"locked": [
142+
{ $match: { $and: [
143+
{ fsyncLock: { $exists: true } },
144+
{ fsyncLock: true }
145+
] } }],
146+
"unlocked": [
147+
{ $match: { fsyncLock: { $exists: false } } }
148+
]
149+
} },
150+
{ $project: {
151+
"fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
152+
"fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
153+
} }
154+
] )
155+
156+
.. output::
157+
:language: json
158+
159+
[ { fsyncLocked: true }, { fsyncUnlocked: false } ]
160+
161+
.. step:: Back up the Primary Config Server
162+
163+
.. note::
164+
165+
Backing up a :ref:`config server <sharding-config-server>` backs
166+
up the sharded cluster's metadata. You only need to back up one
167+
config server, as they all hold the same data. Perform this step
168+
against the CSRS primary member.
169+
170+
To create a filesystem snapshot of the config server, follow the
171+
procedure in :ref:`lvm-backup-operation`.
172+
173+
.. step:: Back up the Primary Shards
174+
175+
Perform a filesystem snapshot against the primary member of each shard,
176+
using the procedure found in :ref:`backup-restore-filesystem-snapshots`.
177+
178+
.. step:: Unlock the Cluster
179+
180+
After the backup completes, you can unlock the cluster to allow writes
181+
to resume.
182+
183+
To unlock the cluster, use the :method:`db.fsyncUnlock` method:
184+
185+
.. code-block:: bash
186+
187+
db.getSibling("admin").fsyncUnlock()
188+
189+
Run the following aggregation pipeline on both :program:`mongos` and
190+
the primary :program:`mongod` of the config servers. To confirm the
191+
unlock, ensure that the ``fysncLocked`` field returns ``false`` and
192+
``fsyncUnlocked`` field returns ``true``.
193+
194+
.. io-code-block::
195+
196+
.. input::
197+
:language: javascript
198+
199+
db.getSiblingDB("admin").aggregate( [
200+
{ $currentOp: { } },
201+
{ $facet: {
202+
"locked": [
203+
{ $match: { $and: [
204+
{ fsyncLock: { $exists: true } },
205+
{ fsyncLock: true }
206+
] } }],
207+
"unlocked": [
208+
{ $match: { fsyncLock: { $exists: false } } }
209+
]
210+
} },
211+
{ $project: {
212+
"fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
213+
"fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
214+
} }
215+
] )
216+
217+
.. output::
218+
:language: json
219+
220+
[ { fsyncLocked: false }, { fsyncUnlocked: true } ]
221+
222+
.. step:: Restart the Balancer
223+
224+
To restart the balancer, use the :method:`sh.startBalancer` method:
225+
226+
.. code-block:: javascript
227+
228+
sh.startBalancer()
229+
230+
To confirm that the balancer is running, use the
231+
:method:`sh.getBalancerState` method:
232+
233+
.. io-code-block::
234+
235+
.. input::
236+
:language: javascript
237+
238+
sh.getBalancerState()
239+
240+
.. output::
241+
:language: javascript
242+
243+
true
84244

85-
.. include:: /includes/steps/backup-sharded-cluster-with-snapshots.rst
245+
The command returns ``true`` when the balancer is running.

0 commit comments

Comments
 (0)