diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index 176c5b11c52..b234a44c7a4 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -31,6 +31,8 @@ and development. Such a cluster will have the following components: - 1 :program:`mongos` instance. +.. _sharding-production-deployment: + Deploying a Production Cluster ------------------------------ @@ -38,42 +40,22 @@ When deploying a shard cluster to production, you must ensure that the data is redundant and that your individual nodes are highly available. To that end, a production-level shard cluster should have the following: -- 3 :ref:`config servers `, each residing on a separate node. - -- For each shard, a three member :term:`replica set ` consisting of: - - - 3 :program:`mongod` replicas or - - .. seealso:: ":doc:`/administration/replication-architectures`" - and ":doc:`/administration/replica-sets`." - - - 2 :program:`mongod` replicas and a single - :program:`mongod` instance acting as a :term:`arbiter`. - - .. optional:: - - All replica set configurations and options are available. - - You may also choose to deploy a :ref:`hidden member - ` for backups or a - :ref:`delayed member `. +- 3 :ref:`config servers `, each residing on a separate system. - You might also keep a member of each replica set in a - geographically distinct data center in case the primary data - center becomes unavailable. - - See ":doc:`/replication`" for more information on replication - and :term:`replica sets `. - - .. seealso:: The ":ref:`sharding-procedure-add-shard`" and - ":ref:`sharding-procedure-remove-shard`" procedures for more - information. +- 3 member :term:`replica set ` for each shard. - :program:`mongos` instances. Typically, you will deploy a single :program:`mongos` instance on every application server. Alternatively, you may deploy several `mongos` nodes and let your application connect to these via a load balancer. +.. seealso:: ":doc:`/administration/replication-architectures`" + and ":doc:`/administration/replica-sets`." + +.. seealso:: The ":ref:`sharding-procedure-add-shard`" and + ":ref:`sharding-procedure-remove-shard`" procedures for more + information. + Sharded and Non-Sharded Data ---------------------------- @@ -118,3 +100,66 @@ created subsequently, may reside on any shard in the cluster. .. [#overloaded-primary-term] The term "primary" in the context of databases and sharding, has nothing to do with the term :term:`primary` in the context of :term:`replica sets `. + +Failover scenarios within MongoDB +--------------------------------- + +A properly deployed MongoDB shard cluster will have no single point +of failure. This section describes potential points of failure within +a shard cluster and its recovery method. + +For reference, a properly deployed MongoDB shard cluster consists of: + + - 3 :term:`config database`, + + - 3 member :term:`replica set` for each shard and + + - :program:`mongos` running on each application server. + +Potential failure scenarios: + +- A :term:`mongos` or the application server failing. + + As each application server is running its own :program:`mongos` + instance, the database is still accessible for other application + servers and the data is intact. :program:`mongos` is stateless, so + if it fails, no critical information is lost. When :program:`mongos` + restarts, it will retrieve a copy of the configuration from the + :term:`config database` and resume working. + + Suggested user intervention: restart application servers and/or + :program:`mongos`. + +- A single :term:`mongod` suffers a failure in a shard. + + A single :term:`mongod` instance failing within a shard will be + recovered by a :term:`secondary` member of the :term:`replica + set`. As each shard will have two :term:`secondary` members with the + exact same copy of the information, :term:`secondary` members will + be able to replace the failed :term:`primary` member. + + Suggested course of action: investigate failure and replace + :term:`primary` member as soon as possible. Additional loss of + members on same shard will reduce availablility and the shard + cluster's data set reliability. + +- All three replica set members of a shard fail. + + All data within that shard will be unavailable, but the shard + cluster's other data will still be operational for applications and + new data can be written to other shard members. + + Suggested course of action: investigate situation immediately. + +- A :term:`config database` server suffers a failure. + + As the :term:`config database` is deployed in a 3 member + configuration with two-phase commits to maintain synchronization + between all members. Shard cluster operation will continue as normal + but :ref:`chunk migration` will not occur. + + Suggested course of action: replace :term:`config database` server + as soon as possible. Shards will become unbalanced without chunk + migration capability. Additional loss of :term:`config database` + servers will put the shard cluster metadata in jeopardy. +