mongodb · tychoish · Aug 28, 2012 · Aug 27, 2012 · Aug 27, 2012 · Aug 27, 2012
diff --git a/source/administration/monitoring.txt b/source/administration/monitoring.txt
@@ -339,11 +339,11 @@ This returns all operations that lasted longer than 100 milliseconds.
 Ensure that the value specified here (i.e. ``100``) is above the
 :setting:`slowms` threshold.
 
-.. seealso:: The ":wiki:`Optimization`" wiki page addresses strategies
+.. seealso:: The :wiki:`Optimization` wiki page addresses strategies
    that may improve the performance of your database queries and
    operations.
 
-.. STUB ":doc:`/applications/optimization`"
+.. STUB :doc:`/applications/optimization`
 
 .. _replica-set-monitoring:
 
@@ -355,30 +355,32 @@ replica sets, beyond the requirements for any MongoDB instance is
 "replication lag." This refers to the amount of time that it takes a
 write operation on the :term:`primary` to replicate to a
 :term:`secondary`. Some very small delay period may be acceptable;
-however, as replication lag grows two significant problems emerge:
+however, as replication lag grows, two significant problems emerge:
 
 - First, operations that have occurred in the period of lag are not
   replicated to one or more secondaries. If you're using replication
   to ensure data persistence, exceptionally long delays may impact the
   integrity of your data set.
 
 - Second, if the replication lag exceeds the length of the operation
-  log (":term:`oplog`") then the secondary will have to resync all data
+  log (:term:`oplog`) then the secondary will have to resync all data
   from the :term:`primary` and rebuild all indexes. In normal
   circumstances this is uncommon given the typical size of the oplog,
-  but presents a major problem.
+  but it's an issue to be aware of.
+
+For causes of replication lag, see :ref:`Replication Lag <replica-set-replication-lag>`.
 
 Replication issues are most often the result of network connectivity
-issues between members or a :term:`primary` instance that does not
+issues between members or the result of a :term:`primary` that does not
 have the resources to support application and replication traffic. To
-check the status of a replica use the :dbcommand:`replSetGetStatus` or
+check the status of a replica, use the :dbcommand:`replSetGetStatus` or
 the following helper in the shell:
 
 .. code-block:: javascript
 
    rs.status()
 
-See the ":doc:`/reference/replica-status`" document for a more in
+See the :doc:`/reference/replica-status` document for a more in
 depth overview view of this output. In general watch the value of
 :status:`optimeDate`. Pay particular attention to the difference in
 time between the :term:`primary` and the :term:`secondary` members.
@@ -393,7 +395,7 @@ option, :program:`mongod` will create an default sized oplog.
 By default the oplog is 5% of total available disk space on 64-bit
 systems.
 
-.. seealso:: ":doc:`/tutorial/change-oplog-size`"
+.. seealso:: :doc:`/tutorial/change-oplog-size`
 
 Sharding and Monitoring
 -----------------------
@@ -404,10 +406,10 @@ instances. Additionally, shard clusters require monitoring to ensure
 that data is effectively distributed among nodes and that sharding
 operations are functioning appropriately.
 
-.. seealso:: See the ":wiki:`Sharding`" wiki page for more
+.. seealso:: See the :wiki:`Sharding` wiki page for more
    information.
 
-.. STUB ":doc:`/core/sharding`"
+.. STUB :doc:`/core/sharding`
 
 Config Servers
 ~~~~~~~~~~~~~~

diff --git a/source/administration/replica-sets.txt b/source/administration/replica-sets.txt
@@ -528,21 +528,24 @@ Replication Lag
 ~~~~~~~~~~~~~~~
 
 Replication lag is a delay between an operation on the :term:`primary`
-and the application of that operation from :term:`oplog` to the
+and the application of that operation from the :term:`oplog` to the
 :term:`secondary`. Such lag can be a significant issue and can
 seriously affect MongoDB :term:`replica set` deployments. Excessive
 replication lag makes "lagged" members ineligible to quickly become
 primary and increases the possibility that distributed
 read operations will be inconsistent.
 
-Identify replication lag by checking the values of
+Identify replication lag by checking the value of
 :data:`members[n].optimeDate` for each member of the replica set
 using the :method:`rs.status()` function in the :program:`mongo`
 shell.
 
+Also, you can monitor how fast replication occurs by watching the oplog
+time in the "replica" graph in MMS.
+
 Possible causes of replication lag include:
 
-- **Network Latency.**
+- **Network Latency**
 
   Check the network routes between the members of your set to ensure
   that there is no packet loss or network routing issue.
@@ -551,7 +554,7 @@ Possible causes of replication lag include:
   members and ``traceroute`` to expose the routing of packets
   network endpoints.
 
-- **Disk Throughput.**
+- **Disk Throughput**
 
   If the file system and disk device on the secondary is
   unable to flush data to disk as quickly as the primary, then
@@ -564,16 +567,41 @@ Possible causes of replication lag include:
   Use system-level tools to assess disk status, including
   ``iostat`` or ``vmstat``.
 
-- **Concurrency.**
+- **Concurrency**
 
   In some cases, long-running operations on the primary can block
-  replication on secondaries. You can use
-  :term:`write concern` to prevent write operations from returning
-  if replication cannot keep up with the write load.
+  replication on secondaries. You can use :term:`write concern` to
+  prevent write operations from returning if replication cannot keep up
+  with the write load.
 
   Use the :term:`database profiler` to see if there are slow queries
   or long-running operations that correspond to the incidences of lag.
 
+- **Appropriate Write Concern**
+
+  If you are performing a large data load that requires a very high
+  number of writes to the primary, and if you have not set the
+  appropriate write concern, the secondaries will not be able to read
+  the oplog fast enough to keep up with changes. Write requests take
+  precedence over read requests, and a very large number of writes will
+  significantly reduce the numbers of reads the secondaries can make
+  from the oplog in order to update themselves.
+
+  The replication lag can grow to the point that the oplog over-writes
+  commands that the secondaries have not yet read. The oplog is a capped
+  collection, and when full it erases the oldest commands in order to
+  write new ones. If the secondaries get too far behind in their reads,
+  they reach a point where they no longer have access to certain
+  updates, and they become stale.
+
+  To prevent this, use "write concern" to tell MongoDB to always perform
+  a safe write after a designated number of inserts, such as after every
+  1,000 inserts. This provides a space for the secondaries to perform
+  reads and catch up with the primary. Using safe writes slightly slows
+  down the data load but keeps your secondaries from going stale.
+
+  See :ref:`replica-set-write-concern` for more information.
+
 Failover and Recovery
 ~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/source/core/replication-internals.txt b/source/core/replication-internals.txt
@@ -25,7 +25,7 @@ replicate this log by applying the operations to themselves in an
 asynchronous process.  Under normal operation, :term:`secondary` members
 reflect writes within one second of the primary. However, various
 exceptional situations may cause secondaries to lag behind further. See
-:term:`replication lag` for details.
+:ref:`Replication Lag <replica-set-replication-lag>` for details.
 
 All members send heartbeats (pings) to all other members in the set and can
 import operations to the local oplog from any other member in the set.