@@ -6,15 +6,12 @@ Back Up a Sharded Cluster with File System Snapshots
6
6
7
7
.. default-domain:: mongodb
8
8
9
-
10
-
11
9
.. contents:: On this page
12
10
:local:
13
11
:backlinks: none
14
12
:depth: 1
15
13
:class: singlecol
16
14
17
-
18
15
Overview
19
16
--------
20
17
@@ -40,15 +37,15 @@ Encrypted Storage Engine (MongoDB Enterprise Only)
40
37
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41
38
42
39
.. include:: /includes/fact-aes256-backups.rst
43
-
40
+
44
41
Balancer
45
42
~~~~~~~~
46
43
47
44
It is *essential* that you stop the :ref:`balancer
48
45
<sharding-internals-balancing>` before capturing a backup.
49
46
50
47
If the balancer is active while you capture backups, the backup
51
- artifacts may be incomplete and/ or have duplicate data, as :term:`chunks
48
+ artifacts may be incomplete or have duplicate data, as :term:`chunks
52
49
<chunk>` may migrate while recording backups.
53
50
54
51
Precision
@@ -58,28 +55,191 @@ In this procedure, you will stop the cluster balancer and take a backup
58
55
up of the :term:`config database`, and then take backups of each
59
56
shard in the cluster using a file-system snapshot tool. If you need an
60
57
exact moment-in-time snapshot of the system, you will need to stop all
61
- application writes before taking the file system snapshots; otherwise
62
- the snapshot will only approximate a moment in time.
63
-
64
- For approximate point-in-time snapshots, you can minimize the impact on
65
- the cluster by taking the backup from a secondary member of each
66
- replica set shard.
58
+ writes before taking the file system snapshots; otherwise the snapshot will
59
+ only approximate a moment in time.
67
60
68
61
Consistency
69
62
~~~~~~~~~~~
70
63
71
- If the journal and data files are on the same logical volume, you can
72
- use a single point-in-time snapshot to capture a consistent copy of the
73
- data files.
74
-
75
- If the journal and data files are on different file systems, you must
76
- use :method:`db.fsyncLock()` and :method:`db.fsyncUnlock()` to ensure
77
- that the data files do not change, providing consistency for the
78
- purposes of creating backups.
64
+ To back up a sharded cluster, you must use the :dbcommand:`fsync` command or
65
+ :method:`db.fsyncLock` method to stop writes on the cluster. This ensures that
66
+ data files do not change during the backup.
79
67
80
68
.. include:: /includes/fact-backup-snapshots-with-ebs-in-raid10.rst
81
69
82
- Procedure
83
- ---------
70
+ Steps
71
+ -----
72
+
73
+ To take a self-managed backup of a sharded cluster, complete the following
74
+ steps:
75
+
76
+ .. procedure::
77
+ :style: normal
78
+
79
+ .. step:: Find a Backup Window
80
+
81
+ Chunk migrations, resharding, and schema migration operations can cause
82
+ inconsistencies in backups. To find a good time to perform a backup,
83
+ monitor your application and database usage and find a time when these
84
+ operations are unlikely to occur.
85
+
86
+ For more information, see :ref:`sharded-schedule-backup`.
87
+
88
+ .. step:: Stop the Balancer
89
+
90
+ To prevent chunk migrations from disrupting the backup, use
91
+ the :method:`sh.stopBalancer` method to stop the balancer:
92
+
93
+ .. code-block:: javascript
94
+
95
+ sh.stopBalancer()
96
+
97
+ If a balancing round is currently in progress, the operation waits for
98
+ balancing to complete.
99
+
100
+ To confirm that the balancer is stopped, use the
101
+ :method:`sh.getBalancerState` method:
102
+
103
+ .. io-code-block::
104
+
105
+ .. input::
106
+ :language: javascript
107
+
108
+ sh.getBalancerState()
109
+
110
+ .. output::
111
+ :language: javascript
112
+
113
+ false
114
+
115
+ The command returns ``false`` when the balancer is stopped.
116
+
117
+ .. step:: Lock the Cluster
118
+
119
+ Writes to the database can cause backup inconsistencies. Lock your
120
+ sharded cluster to protect the database from writes.
121
+
122
+ To lock a sharded cluster, use the :method:`db.fsyncLock` method:
123
+
124
+ .. code-block:: javascript
125
+
126
+ db.getSiblingDB("admin").fsyncLock()
127
+
128
+ Run the following aggregation pipeline on both :program:`mongos` and
129
+ the primary :program:`mongod` of the config servers. To confirm the
130
+ lock, ensure that the ``fysncLocked`` field returns ``true`` and
131
+ ``fsyncUnlocked`` field returns ``false``.
132
+
133
+ .. io-code-block::
134
+
135
+ .. input::
136
+ :language: javascript
137
+
138
+ db.getSiblingDB("admin").aggregate( [
139
+ { $currentOp: { } },
140
+ { $facet: {
141
+ "locked": [
142
+ { $match: { $and: [
143
+ { fsyncLock: { $exists: true } },
144
+ { fsyncLock: true }
145
+ ] } }],
146
+ "unlocked": [
147
+ { $match: { fsyncLock: { $exists: false } } }
148
+ ]
149
+ } },
150
+ { $project: {
151
+ "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
152
+ "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
153
+ } }
154
+ ] )
155
+
156
+ .. output::
157
+ :language: json
158
+
159
+ [ { fsyncLocked: true }, { fsyncUnlocked: false } ]
160
+
161
+ .. step:: Back up the Primary Config Server
162
+
163
+ .. note::
164
+
165
+ Backing up a :ref:`config server <sharding-config-server>` backs
166
+ up the sharded cluster's metadata. You only need to back up one
167
+ config server, as they all hold the same data. Perform this step
168
+ against the CSRS primary member.
169
+
170
+ To create a filesystem snapshot of the config server, follow the
171
+ procedure in :ref:`lvm-backup-operation`.
172
+
173
+ .. step:: Back up the Primary Shards
174
+
175
+ Perform a filesystem snapshot against the primary member of each shard,
176
+ using the procedure found in :ref:`backup-restore-filesystem-snapshots`.
177
+
178
+ .. step:: Unlock the Cluster
179
+
180
+ After the backup completes, you can unlock the cluster to allow writes
181
+ to resume.
182
+
183
+ To unlock the cluster, use the :method:`db.fsyncUnlock` method:
184
+
185
+ .. code-block:: bash
186
+
187
+ db.getSibling("admin").fsyncUnlock()
188
+
189
+ Run the following aggregation pipeline on both :program:`mongos` and
190
+ the primary :program:`mongod` of the config servers. To confirm the
191
+ unlock, ensure that the ``fysncLocked`` field returns ``false`` and
192
+ ``fsyncUnlocked`` field returns ``true``.
193
+
194
+ .. io-code-block::
195
+
196
+ .. input::
197
+ :language: javascript
198
+
199
+ db.getSiblingDB("admin").aggregate( [
200
+ { $currentOp: { } },
201
+ { $facet: {
202
+ "locked": [
203
+ { $match: { $and: [
204
+ { fsyncLock: { $exists: true } },
205
+ { fsyncLock: true }
206
+ ] } }],
207
+ "unlocked": [
208
+ { $match: { fsyncLock: { $exists: false } } }
209
+ ]
210
+ } },
211
+ { $project: {
212
+ "fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
213
+ "fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
214
+ } }
215
+ ] )
216
+
217
+ .. output::
218
+ :language: json
219
+
220
+ [ { fsyncLocked: false }, { fsyncUnlocked: true } ]
221
+
222
+ .. step:: Restart the Balancer
223
+
224
+ To restart the balancer, use the :method:`sh.startBalancer` method:
225
+
226
+ .. code-block:: javascript
227
+
228
+ sh.startBalancer()
229
+
230
+ To confirm that the balancer is running, use the
231
+ :method:`sh.getBalancerState` method:
232
+
233
+ .. io-code-block::
234
+
235
+ .. input::
236
+ :language: javascript
237
+
238
+ sh.getBalancerState()
239
+
240
+ .. output::
241
+ :language: javascript
242
+
243
+ true
84
244
85
- .. include:: /includes/steps/backup-sharded-cluster-with-snapshots.rst
245
+ The command returns ``true`` when the balancer is running.
0 commit comments