Skip to content

Commit 3bd7da3

Browse files
DOCSP-44991 -- Resiliency 2nd Draft (#78)
* DOCSP-44991 -- rebuild staging * DOCSP-44991 -- add subheadings to on page toc * DOCSP-44991 -- add link to replication page * DOCSP-44991 -- external review revisions * DOCSP-44991 -- copy review revisions
1 parent 713f25d commit 3bd7da3

File tree

1 file changed

+46
-44
lines changed

1 file changed

+46
-44
lines changed

source/resiliency.txt

Lines changed: 46 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. _arch-center-resiliency:
22

3-
===================================
4-
Application and Database Resiliency
5-
===================================
3+
=================================================
4+
Atlas Features and Recommendations for Resiliency
5+
=================================================
66

77
.. default-domain:: mongodb
88

@@ -28,16 +28,17 @@ Features
2828
Database Replication
2929
````````````````````
3030

31-
|service| {+clusters+} consist of a minimum of three nodes, and you can increase
32-
the node count to any odd number of nodes you require. |service| first writes data
33-
from your application to a primary node, and then |service| incrementally replicates
34-
and stores that data across all secondary nodes within your {+cluster+}. Additionally,
35-
you can control the durability of your data storage by adjusting the write concern
36-
of your application code to complete the write only once a certain number of secondaries
31+
|service| {+clusters+} consist of a `replica set <https://www.mongodb.com/docs/manual/replication/>`__
32+
with a minimum of three nodes, and you can increase the node count to any odd
33+
number of nodes you require. |service| first writes data from your application
34+
to a `primary node <https://www.mongodb.com/docs/manual/core/replica-set-primary/>`__, and then |service| incrementally replicates and stores that
35+
data across all `secondary nodes <https://www.mongodb.com/docs/manual/core/replica-set-secondary/>`__ within your {+cluster+}. To
36+
control the durability of your data storage, you can adjust the `write concern <https://www.mongodb.com/docs/manual/reference/write-concern/>`__ of
37+
your application code to complete the write only once a certain number of secondaries
3738
have committed the write. To learn more, see :ref:`resiliency-read-write-concerns`.
3839

3940
By default, |service| distributes {+cluster+} nodes across availability zones within
40-
one of your chosen cloud provider's availability regions. For example, if your
41+
one of your chosen cloud provider's `availability regions <https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html>`__. For example, if your
4142
{+cluster+} is deployed to the cloud provider region ``us-east``, |service| deploys
4243
nodes to ``us-east-a``, ``us-east-b`` and ``us-east-c`` by default.
4344

@@ -47,13 +48,14 @@ see :ref:`arch-center-high-availability`.
4748
Self-Healing Deployments
4849
````````````````````````
4950

50-
|service| {+clusters+} must consist of an odd number of nodes, because only one
51-
node can be elected as the primary node to and from which your application writes
52-
and reads directly.
51+
|service| {+clusters+} must consist of an odd number of nodes, because the node
52+
pool must elect a primary node to and from which your application writes
53+
and reads directly. A cluster consisting of an even number of nodes might
54+
result in a deadlock that prevents a primary node from being elected.
5355

5456
In the event that a primary node is unavailable, because of infrastructure
5557
outages, maintenance windows or any other reason, |service| {+clusters+} self-heal
56-
by converting an existing secondary node into your primary node to maintain
58+
by promoting an existing secondary node to the role of primary node to maintain
5759
database availability. To learn more about this process, see `How does MongoDB Atlas deliver high availability? <https://www.mongodb.com/docs/atlas/reference/faq/deployment/#how-does-service-fullname-deliver-high-availability->`__
5860

5961
Maintenance Window Uptime
@@ -134,15 +136,15 @@ related to resilience:
134136
Connecting Your Application to |service|
135137
`````````````````````````````````````````
136138

137-
We recommend that you use the most `current driver version <https://www.mongodb.com/docs/drivers/>`__
139+
We recommend that you use a connection method built on the most `current driver version <https://www.mongodb.com/docs/drivers/>`__
138140
for your application's programming language whenever possible. And while the
139141
default connection string |service| provides is a good place to start, you might
140142
want to tune it for performance in the context of your specific application
141143
and deployment architecture.
142144

143-
For example, you might want to set a short :urioption:`connectTimeoutMS` for a
145+
For example, you might want to set a short ``maxTimeMS`` for a
144146
microservice that provides a login capability, whereas you may want to set the
145-
``connectTimeoutMS`` to a much larger value if the application code is a long-running
147+
``maxTimeMS`` to a much larger value if the application code is a long-running
146148
analytics job request against the cluster.
147149

148150
`Tuning your connection pool settings <https://www.mongodb.com/docs/manual/tutorial/connection-pool-performance-tuning/>`__
@@ -162,7 +164,7 @@ application.
162164

163165
For example, if you are scaling your |service| {+cluster+} to meet user demand,
164166
consider what the minimum pool size of connections your application will
165-
consistently need, so that when the connection pool scales the additional
167+
consistently need, so that when the application pool scales the additional
166168
networking and compute load that comes with opening new client connections
167169
doesn't undermine your application's time-sensitive need for increased
168170
database operations.
@@ -171,37 +173,34 @@ Min and Max Connection Pool Size
171173
`````````````````````````````````
172174

173175
If your ``minPoolSize`` and ``maxPoolSize`` values are similar, the majority of your
174-
database client connections will open at application startup. In turn, the
175-
additional networking load that comes with opening such connections will happen
176-
at the same time. However, if there is a large range in size between your
177-
minimum and maximum pool size, additional connections are opened more frequently
178-
during application runtime.
179-
180-
This process of incrementally increasing your connection pool size during
181-
application runtime distributes the total workload of connecting clients from
182-
your application to |service| over a longer period of time, which often makes it
183-
manageable for a given use case, but it is important to note that the associated
184-
increase in network load occurs during application runtime, which has
185-
the potential to impact perceived database - and by extension - application
186-
performance for end-users.
176+
database client connections open at application startup. For example, if your
177+
``minPoolSize`` is set to ``10`` and your ``maxPoolSize`` is set to ``12``, 10
178+
client connections open at application startup, and only 2 more connections
179+
can then be opened during application runtime. However, if your ``minPoolSize``
180+
is set to ``10`` and your ``maxPoolSize`` is set to ``100``, up to 90 additional
181+
connections can be opened as needed during application runtime.
182+
183+
Additional network overhead associated with opening new client connections.
184+
So, consider whether you would prefer to incur that network cost at
185+
application startup, or to incur it dynamcially in as as-needed basis during
186+
application runtime, which has the potential to impact operational latency and
187+
perceived performance for end-users if there is a sudden spike in requests that
188+
requires a large number of additional connections to be opened at once.
187189

188190
Your application's architecture is central to this consideration. If, for example,
189-
you deploy your application as microservices in an elastic environment, consider
190-
which services should call |service| directly as a means of controlling the
191-
dynamic expansion and contraction of your connection pool.
191+
you deploy your application as microservices, consider which services should
192+
call |service| directly as a means of controlling the dynamic expansion and
193+
contraction of your connection pool. Alternatively, if your application deployment
194+
is leveraging single-threaded resources, like AWS Lambda, your application will
195+
only ever be able to open and use one client connection, so your ``minPoolSize``
196+
and your ``maxPoolSize`` should both be set to ``1``.
192197

193198
Query Timeout
194199
`````````````
195200

196201
Almost invariably, workload-specific queries from your application will vary in
197202
terms of the amount of time they take to execute in |service| and in terms of
198-
the amount of time your application can wait for a response.
199-
200-
Consider defining query classes that handle categories or buckets of similar
201-
request requirements. For example, you can define a query category with a fast
202-
timeout for end-user driven requests, a middle tier timeout bucket for general
203-
purpose requests, and a long-running query class for things like analytics
204-
queries that require the most time to execute in |service|.
203+
the amount of time your application can wait for a response.
205204

206205
You can set `query timeout <https://www.mongodb.com/docs/manual/tutorial/query-documents/specify-query-timeout/>`__
207206
behavior globally in |service|, and you can also define it at the query level.
@@ -220,12 +219,16 @@ Configure Read and Write Concerns
220219
`````````````````````````````````
221220

222221
|service| {+clusters+} eventually replicate all data across all nodes. However,
223-
you can configure the number of nodes across which data must be repicated before
222+
you can configure the number of nodes across which data must be replicated before
224223
a read or write operation is reported to have been successful. You can define
225224
`read concerns <https://www.mongodb.com/docs/manual/reference/read-concern/>`__ and
226225
`write concerns <https://www.mongodb.com/docs/manual/reference/write-concern/>`__
227226
globally in |service|, and you can also define them at the client level in your
228-
connection string.
227+
connection string. |service| has a default write concern of ``majority``, meaning that
228+
data must be replicated across more than half of the nodes in your cluster
229+
before |service| reports success. Conversely, |service| has a default read concern
230+
of ``local``, which means that when queried, |service| retrieves data from only
231+
one node in your cluster
229232

230233
.. _arch-center-move-collection:
231234

@@ -247,4 +250,3 @@ Resilient Example Application
247250
`````````````````````````````
248251

249252
.. include:: /includes/cloud-docs/example-resilient-app.rst
250-

0 commit comments

Comments
 (0)