Skip to content

DOCS-270 migrate design concepts #196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 7, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 20 additions & 17 deletions source/administration/replica-sets.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,13 @@ configurations.
.. warning::

The :method:`rs.reconfig()` shell command can force the current
primary to step down, which causes an election. When the primary
primary to step down, which causes an :ref:`election <replica-set-elections>`. When the primary
steps down, the :program:`mongod` closes all client
connections. While, this typically takes 10-20 seconds, attempt to
make these changes during scheduled maintenance periods.

.. include:: /includes/seealso-elections.rst

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears in a couple of places and should be an included file.

I'm not sure that this needs to be a list, but it needs some sort of terminal punctuation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

.. index:: replica set members; secondary only
.. _replica-set-secondary-only-members:
.. _replica-set-secondary-only-configuration:
Expand All @@ -69,9 +71,10 @@ these members from ever becoming primary.

To configure a member as secondary-only, set its
:data:`members[n].priority` value to ``0``. Any member with a
:data:`members[n].priority` equal to ``0`` will never seek election and
cannot become primary in any situation. For more information on priority
levels, see :ref:`replica-set-node-priority`.
:data:`members[n].priority` equal to ``0`` will never seek
:ref:`election <replica-set-elections>` and cannot become primary in any
situation. For more information on priority levels, see
:ref:`replica-set-node-priority`.

As an example of modifying member priorities, assume a four-member
replica set with member ``_id`` values of: ``0``, ``1``, ``2``, and
Expand Down Expand Up @@ -107,7 +110,7 @@ This sets the following:
If your replica set has an even number of members, add an
:ref:`arbiter <replica-set-arbiters>` to ensure that
members can quickly obtain a majority of votes in an
:ref:`election <replica-set-elections>` for primary.
election for primary.

.. seealso:: :data:`members[n].priority` and :ref:`Replica Set
Reconfiguration <replica-set-reconfiguration-usage>`.
Expand Down Expand Up @@ -155,7 +158,7 @@ other members in the set will not advertise the hidden member in the
of ``0``, the operation fails.

.. seealso:: :ref:`Replica Set Read Preference <replica-set-read-preference>`
and :ref:`Replica Set Reconfiguration <replica-set-reconfiguration-usage>`
and :ref:`Replica Set Reconfiguration <replica-set-reconfiguration-usage>`.

.. index:: replica set members; delayed
.. _replica-set-delayed-members:
Expand Down Expand Up @@ -183,8 +186,8 @@ the amount of slave delay to apply:

- The size of the oplog is sufficient to capture *more than* the
number of operations that typically occur in that period of
time. See the section on :ref:`oplog sizing
<replica-set-oplog-sizing>` for more information.
time. For more information on oplog size, see the
:ref:`replica-set-oplog-sizing` topic in the :doc:`/core/replication` document.

Delayed members must have a :term:`priority` set to ``0`` to prevent
them from becoming primary in their replica sets. Also these members
Expand Down Expand Up @@ -233,7 +236,7 @@ Arbiters

Arbiters are special :program:`mongod` instances that do not hold a
copy of the data and thus cannot become primary. Arbiters exist solely
participate in :term:`elections <election>`.
participate in :ref:`elections <replica-set-elections>`.

.. note::

Expand Down Expand Up @@ -290,15 +293,14 @@ Non-Voting
~~~~~~~~~~

You may choose to change the number of votes that each member has in
:term:`elections <election>` for :term:`primary`. In general, all
:ref:`elections <replica-set-elections>` for :term:`primary`. In general, all
members should have only 1 vote to prevent intermittent ties, deadlock,
or the wrong members from becoming :term:`primary`. Use :ref:`replica
set priorities <replica-set-node-priority>` to control which members
are more likely to become primary.

To disable a member's ability to vote in :ref:`elections
<replica-set-elections>` use the following command sequence in the
:program:`mongo` shell.
To disable a member's ability to vote in elections, use the following
command sequence in the :program:`mongo` shell.

.. code-block:: javascript

Expand Down Expand Up @@ -394,7 +396,7 @@ you specify a full configuration object with :method:`rs.add()`, you must
declare the ``_id`` field, which is not automatically populated in
this case.

.. seealso:: :doc:`/tutorial/expand-replica-set`
.. seealso:: :doc:`/tutorial/expand-replica-set`.

.. _replica-set-admin-procedure-remove-members:

Expand Down Expand Up @@ -454,7 +456,7 @@ number. :method:`rs.reconfig()` will not change the value of
.. warning::

Any replica set configuration change can trigger the current
:term:`primary` to step down, which forces an :term:`election`. This
:term:`primary` to step down, which forces an :ref:`election <replica-set-elections>`. This
causes the current shell session, and clients connected to this replica set,
to produce an error even when the operation succeeds.

Expand Down Expand Up @@ -486,7 +488,7 @@ the new configuration.

If a member has :data:`members[n].priority` set to ``0``, it is
ineligible to become :term:`primary` and will not seek
elections. :ref:`Hidden members <replica-set-hidden-members>`,
election. :ref:`Hidden members <replica-set-hidden-members>`,
:ref:`delayed members <replica-set-delayed-members>`, and
:ref:`arbiters <replica-set-arbiters>` all have :data:`members[n].priority`
set to ``0``.
Expand Down Expand Up @@ -741,4 +743,5 @@ data to a :term:`BSON` file that you can view using
You can prevent rollbacks by ensuring safe writes by using
the appropriate :term:`write concern`.

.. seealso:: :ref:`Replica Set Elections <replica-set-elections>`
.. include:: /includes/seealso-elections.rst

30 changes: 30 additions & 0 deletions source/applications/replication.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,36 @@ This document describes those options and their implications.
shards are also replica sets provide the same configuration options
with regards to write and read operations.

.. TODO Is any of the following missing from this document:

.. Writes committed at the primary may be visible before the
cluster-wide commit completes. The read uncommitted semantics (an
option on many databases) are more relaxed and make theoretically
achievable performance and availability higher (for example we never
have an object locked in the server where the locking is dependent on
network performance).

.. On a failover, if there are writes which have not replicated from the
primary, the writes are rolled back. To confirm replica-set-wide
commits, use the getLastError command. On a failover, data is backed
up to files in the rollback directory. To recover this data use the
mongorestore.

.. Merging back old operations later, after another member has accepted
writes, is a hard problem. One then has multi-master replication,
with potential for conflicting writes. Typically that is handled in
other products by manual version reconciliation code by developers.
That is too much work. Multi-master also can make atomic operation
semantics problematic. It is possible (as mentioned above) to
manually recover these events, via manual DBA effort, but in large
system with many, many members that such efforts become impractical.

.. Calling getLastError causes the client to wait for a response from
the server. This can slow the client's throughput on writes if large
numbers are made because of the client/server network turnaround
times. Thus for "non-critical" writes it often makes sense to make no
getLastError check at all, or only a single check after many writes.

.. _write-concern:
.. _replica-set-write-concern:

Expand Down
148 changes: 81 additions & 67 deletions source/core/replication-internals.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@ troubleshooting and for further understanding MongoDB's behavior and approach.
Oplog
-----

Replication itself works by way of a special :term:`capped collection`
called the :term:`oplog`. This collection keeps a rolling record of
all operations applied to the :term:`primary`. Secondary members then
replicate this log by applying the operations to themselves in an
asynchronous process. Under normal operation, :term:`secondary` members
reflect writes within one second of the primary. However, various
exceptional situations may cause secondaries to lag behind further. See
For an explanation of the oplog, see the :ref:`oplog <replica-set-oplog-sizing>`
topic in the :doc:`/core/replication` document.

Under various exceptional
situations, updates to a :term:`secondary's <secondary>` oplog might
lag behind the desired performance time. See
:ref:`Replication Lag <replica-set-replication-lag>` for details.

All members send heartbeats (pings) to all other members in the set and can
import operations to the local oplog from any other member in the set.
All members of a :term:`replica set` send heartbeats (pings) to all
other members in the set and can import operations to the local oplog
from any other member in the set.

Replica set oplog operations are :term:`idempotent`. The following
operations require idempotency:
Expand All @@ -37,20 +37,21 @@ operations require idempotency:
- post-rollback catch-up
- sharding chunk migrations

.. seealso:: The :ref:`replica-set-oplog-sizing` topic in
:doc:`/core/replication`.

.. TODO Verify that "sharding chunk migrations" (above) requires
idempotency. The wiki was unclear on the subject.

.. In 2.0, replicas would import entries from the member lowest
.. "ping," This wasn't true in 1.8 and will likely change in 2.2.

.. _replica-set-data-integrity:
.. _replica-set-implementation:

Implementation
Data Integrity
--------------

Read Preferences
~~~~~~~~~~~~~~~~

MongoDB uses :term:`single-master replication` to ensure that the
database remains consistent. However, clients may modify the
:ref:`read preferences <replica-set-read-preference>` on a
Expand All @@ -59,10 +60,9 @@ per-connection basis in order to distribute read operations to the
greater query throughput by distributing reads to secondary members. But
keep in mind that replication is asynchronous; therefore, reads from
secondaries may not always reflect the latest writes to the
:term:`primary`. See the :ref:`consistency <replica-set-consistency>`
section for more about :ref:`read preference
<replica-set-read-preference>` and :ref:`write concern
<replica-set-write-concern>`.
:term:`primary`.

.. seealso:: :ref:`replica-set-consistency`

.. note::

Expand All @@ -71,16 +71,12 @@ section for more about :ref:`read preference
output to asses the current state of replication and determine if
there is any unintended replication delay.

In the default configuration, all members have an equal chance of
becoming primary; however, it's possible to set :data:`priority <members[n].priority>` values that
weight the election. In some architectures, there may be operational
reasons for increasing the likelihood of a specific replica set member
becoming primary. For instance, a member located in a remote data
center should *not* become primary. See: :ref:`node
priority <replica-set-node-priority>` for more background on this
concept.

Replica sets can also include members with the following four special
.. _replica-set-member-configurations-internals:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the content in http://docs.mongodb.org/manual/applications/replication/#write-concern

I think we can kill it here, some improvements to the applications page might be inorder

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this here and moved a condensed version over as a TODO comment in the applications page

Member Configurations
---------------------

Replica sets can include members with the following four special
configurations that affect membership behavior:

- :ref:`Secondary-only <replica-set-secondary-only-members>` members have
Expand All @@ -106,6 +102,12 @@ unique set of administrative requirements and concerns. Choosing the
right :doc:`system architecture </administration/replication-architectures>`
for your data set is crucial.

.. seealso:: The :ref:`replica-set-member-configurations` topic in the
:doc:`/administration/replica-sets` document.

Security
--------

Administrators of replica sets also have unique :ref:`monitoring
<replica-set-monitoring>` and :ref:`security <replica-set-security>`
concerns. The :ref:`replica set functions <replica-set-functions>` in
Expand All @@ -122,35 +124,46 @@ modify the configuration of an existing replica set.
Elections
---------

When you initialize a :term:`replica set` for the first time, or when any
failover occurs, an election takes place to decide which member should
Elections are the process :term:`replica set` members use to select which member should
become :term:`primary`. A primary is the only member in the replica
set that can accept write operations, including :method:`insert()
<db.collection.insert()>`, :method:`update() <db.collection.update()>`,
and :method:`remove() <db.collection.remove()>`.

Elections are the process replica set members use to
select the primary in a set. Two types of events can trigger an election:
a primary steps down or a :term:`secondary` member
loses contact with a primary. All members have one vote
in an election, and any :program:`mongod` can veto an election. A
single veto invalidates the election.

An existing primary will step down in response to the
:dbcommand:`replSetStepDown` command or if it sees that one of
the current secondaries is eligible for election *and* has a higher
priority. A secondary will call for an election if it cannot
establish a connection to a primary. A primary will also step
down when it cannot contact a majority of the members of the replica
set. When the current primary steps down, it closes all open client
connections to prevent clients from unknowingly writing data to a
non-primary member.

In an election, every member, including :ref:`hidden
<replica-set-hidden-members>` members, :ref:`arbiters
<replica-set-arbiters>`, and even recovering members, get a single
vote. Members will give votes to every eligible member that calls an
election.
The following events can trigger an election:

- You initialize a replica set for the first time.

- A primary steps down. A primary will step down in response to the
:dbcommand:`replSetStepDown` command or if it sees that one of the
current secondaries is eligible for election *and* has a higher
priority. A primary also will step down when it cannot contact a
majority of the members of the replica set. When the current primary
steps down, it closes all open client connections to prevent clients
from unknowingly writing data to a non-primary member.

- A :term:`secondary` member loses contact with a primary. A secondary
will call for an election if it cannot establish a connection to a
primary.

- A :term:`failover` occurs.

In an election, all members have one vote,
including :ref:`hidden <replica-set-hidden-members>` members, :ref:`arbiters
<replica-set-arbiters>`, and even recovering members.
Any :program:`mongod` can veto an election.

In the default configuration, all members have an equal chance of
becoming primary; however, it's possible to set :data:`priority
<members[n].priority>` values that weight the election. In some
architectures, there may be operational reasons for increasing the
likelihood of a specific replica set member becoming primary. For
instance, a member located in a remote data center should *not* become
primary. See: :ref:`replica-set-node-priority` for more
information.

Any member of a replica set can veto an election, even if the
member is a :ref:`non-voting member <replica-set-non-voting-members>`.

A member of the set will veto an election under the following
conditions:
Expand All @@ -167,15 +180,10 @@ conditions:
(i.e. a higher "optime") than the member seeking election, from the
perspective of the voting member.

- The current primary will also veto an election if it has the same or
- The current primary will veto an election if it has the same or
more recent operations (i.e. a "higher or equal optime") than the
member seeking election.

.. note::

Any member of a replica set *can* veto an election, even if the
member is a :ref:`non-voting member <replica-set-non-voting-members>`.

The first member to receive votes from a majority of members in a set
becomes the next primary until the next election. Be
aware of the following conditions and possible situations:
Expand All @@ -186,15 +194,9 @@ aware of the following conditions and possible situations:

- Replica set members compare priorities only with other members of
the set. The absolute value of priorities does not have any impact on
the outcome of replica set elections.

.. note::

The only exception is that members with :data:`priority
<members[n].priority>` values of ``0``
cannot become primary and will not seek election. See
:ref:`replica-set-node-priority-configuration` for more
information.
the outcome of replica set elections, with the exception of the value ``0``,
which indicates the member cannot become primary and cannot seek election.
For details, see :ref:`replica-set-node-priority-configuration`.

- A replica set member cannot become primary *unless* it has the
highest "optime" of any visible member in the set.
Expand All @@ -204,12 +206,24 @@ aware of the following conditions and possible situations:
primary until the member with the highest priority catches up
to the latest operation.


.. seealso:: :ref:`Non-voting members in a replica
set <replica-set-non-voting-members>`,
:ref:`replica-set-node-priority-configuration`, and
:data:`replica configuration <members[n].votes>`.

Elections and Network Partitions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. TODO The following two paragraphs needs review -BG

Members on either side of a network partition cannot see each other when
determining whether a majority is available to hold an election.

That means that if a primary steps down and neither side of the
partition has a majority on its own, the set will not elect a new
primary and the set will become read only. The best practice is to have
and a majority of servers in one data center and one server in another.

Syncing
-------

Expand Down
Loading