From ba2a824965fb5b277e622cb46fe376fcc9fa26ad Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Wed, 29 Aug 2012 14:01:31 -0400
Subject: [PATCH 1/4] DOCS-458 addition for faq sharding

---
 draft/faq-sharding-addition.txt | 76 +++++++++++++++------------------
 1 file changed, 35 insertions(+), 41 deletions(-)
diff --git a/draft/faq-sharding-addition.txt b/draft/faq-sharding-addition.txt
index 7d2986f1249..f6c0b60c08c 100644
--- a/draft/faq-sharding-addition.txt
+++ b/draft/faq-sharding-addition.txt
@@ -1,54 +1,48 @@
-What are the best ways to successfully insert larger volumes of data into as sharded collection? 
-------------------------------------------------------------------------------------------------
+For high-volume inserts, when is it necessary to first pre-split data?
+----------------------------------------------------------------------
 
-- what is pre-splitting 
+Whether to pre-split before a high-volume insert depends on the
+:term:`shard key`, the existing distribution of :term:`chunks <chunk>`,
+and how evenly distributed the insert operation is.
 
-  In sharded environments, MongoDB distributes data into :term:`chunks
-  <chunk>`, each defined by a range of shard key values. Pre-splitting is a command run
-  prior to data insertion that specifies the shard key values on which to split up chunks.
+In the following cases, we recommend pre-splitting before a large insert:
 
-- Pre-splitting is useful before large inserts into a sharded collection when: 
+- Inserting data into an empty collection
 
-1. inserting data into an empty collection
+  If a collection is empty, the database takes time to determine the
+  optimal key distribution. If you insert many documents in rapid
+  succession, MongoDB initially directs writes to a single chunk, which
+  can affect performance. Predefining splits improves write performance
+  in the early stages of a bulk import by eliminating the database's
+  "learning" period.
 
-If a collection is empty, the database takes time to determine the optimal key
-distribution. If you insert many documents in rapid succession, MongoDB will initially
-direct writes to a single chunk, potentially having significant impacts on performance.
-Predefining splits may improve write performance in the early stages of a bulk import by
-eliminating the database's "learning" period.
+- Data is not evenly distributed
 
-2. data is not evenly distributed 
+  Even if the sharded collection contains existing documents balanced
+  over multiple chunks, :term:`pre-splitting` is beneficial if the write
+  operation itself isn't evenly distributed, i.e., if the inserts
+  include shard-key values that are contained on only a small number of
+  chunks. By pre-splitting and using an increasing shard key, you can
+  prevent writes from monopolizing a single :term:`shard`.
 
-Even if the sharded collection was previously populated with documents and contains multiple
-chunks, pre-splitting may be beneficial if the write operation isn't evenly distributed, in
-other words, if the inserts have shard keys values contained on a small number of chunks.
+- Monotomically increasing shard key.
 
-3. monotomically increasing shard key
+  If you attempt to insert data with monotonically increasing shard
+  keys, the writes will always occur on the last chunk in the
+  collection. Predefining splits helps to cycle a large write operation
+  around the cluster; however, pre-splitting in this instance will not
+  prevent consecutive inserts from hitting a single shard.
 
-If you attempt to insert data with monotonically increasing shard keys, the writes will
-always hit the last chunk in the collection. Predefining splits may help to cycle a large
-write operation around the cluster; however, pre-splitting in this instance will not
-prevent consecutive inserts from hitting a single shard.
+Pre-splitting might *not* be necessary in the following cases:
 
-- when does it not matter
+- If data insertion is not rapid, MongoDB may have enough time to split
+  and migrate chunks without affecting performance.
 
-If data insertion is not rapid, MongoDB may have enough time to split and migrate chunks without
-impacts on performance. In addition, if the collection already has chunks with an even key
-distribution, pre-splitting may not be necessary. 
+- If the collection already has chunks with an even key distribution,
+  pre-splitting may not be necessary.
 
-See the ":doc:`/tutorial/inserting-documents-into-a-sharded-collection`" tutorial for more
-information.
+For more information, see :doc:`/tutorial/inserting-documents-into-a-sharded-collection`.
 
-
-Is it necessary to pre-split data before high volume inserts into a sharded collection?
----------------------------------------------------------------------------------------
-
-The answer depends on the shard key, the existing distribution of chunks, and how
-evenly distributed the insert operation is. If a collection is empty prior to a
-bulk insert, the database will take time to determine the optimal key
-distribution. Predefining splits improves write performance in the early stages
-of a bulk import.
-
-Pre-splitting is also important if the write operation isn't evenly distributed.
-When using an increasing shard key, for example, pre-splitting data can prevent
-writes from hammering a single shard.
+.. SK, I flipped the above sentence, which could instead read:
+.. See :doc:`/tutorial/inserting-documents-into-a-sharded-collection` for more information.
+.. I prefer the former, but I think you prefer the latter. Let me know. -BG

From 8ecece2593ca78a597b93e3bd42b8dc864b7a7d4 Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Wed, 29 Aug 2012 17:56:59 -0400
Subject: [PATCH 2/4] DOCS-458 ongoing edits to pre-splitting doc

---
 ...ng-documents-into-a-sharded-collection.txt | 234 ++++++++++++------
 1 file changed, 152 insertions(+), 82 deletions(-)

diff --git a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
index d0e751c7bef..5caa0a26e95 100644
--- a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
+++ b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
@@ -2,83 +2,112 @@
 Inserting Documents into a Sharded Collection
 =============================================
 
-Shard Keys
-----------
+Types of Distribution
+---------------------
 
-.. TODO 
+MongoDB distributes inserted data in one of three ways, depending on a
+combination of :term:`shard keys <shard key>`, the distribution of
+existing :term:`chunks <chunk>`, and the distribution and volume of the
+inserted data. MongoDB distributes data on one of the following ways:
 
-   outline the ways that insert operations work given shard keys of the following types
+- MongoDB distributes the data evenly around the cluster. For details
+  see :ref:`sharding-even-distribution`.
 
+- MongoDB directs writes unevenly or directs all writes to a single
+  chunk. For details see :ref:`sharding-uneven-distribution`.
 
+- MongoDB inserts the data into the last chunk in the cluster. For
+  details see :ref:`sharding-monotonically-increasing-keys`.
 
+.. _sharding-even-distribution:
 
 Even Distribution
 ~~~~~~~~~~~~~~~~~
 
-If the data's distribution of keys is evenly, MongoDB should be able to distribute writes
-evenly around a the cluster once the chunk key ranges are established. MongoDB will
-automatically split chunks when they grow to a certain size (~64 MB by default) and will
-balance the number of chunks across shards.
+MongoDB distributes writes evenly around the cluster in the following
+cases:
+
+- Data has been pre-split with evenly distributed :term:`shard keys <shard key>`,
+  which MongoDB then establishes and then uses to distribute writes
+  evenly. For details on pre-splitting data with evenly distributed
+  keys, see the :ref:`sharding-pre-splitting` section below.
+
+- The sharded collection contains existing documents balanced over
+  multiple chunks *and* the inserted data is either low volume or evenly
+  distributed.
+
+In even distributions, MongoDB automatically splits chunks when they
+grow to a certain size (~64 MB by default) and automatically balances
+chunks across shards, allowing no more than eight chunks on a shard.
 
-When inserting data into a new collection, it may be important to pre-split the key ranges.
-See the section below on pre-splitting and manually moving chunks.
+.. _sharding-uneven-distribution:
 
 Uneven Distribution
 ~~~~~~~~~~~~~~~~~~~
 
-If you need to insert data into a collection that isn't evenly distributed, or if the shard
-keys of the data being inserted aren't evenly distributed, you may need to pre-split your
-data before doing high volume inserts. As an example, consider a collection sharded by last
-name with the following key distribution:
+MongoDB writes data unevenly in the following cases. To avoid each of
+these cases, pre-split your data, as described the
+:ref:`sharding-pre-splitting` section below:
 
-(-∞, "Henri") 
-["Henri", "Peters") 
-["Peters",+∞) 
+- You insert a large volume of data that isn't evenly distributed. In
+  this case, even if the sharded cluster contains existing documents
+  balanced over multiple chunks, the inserts might include values that
+  are contained only on a small number of chunks.
 
-Although the chunk ranges may be split evenly, inserting lots of users with with a common
-last name such as "Smith" will potentially hammer a single shard. Making the chunk range
-more granular in this portion of the alphabet may improve write performance.
+- The collection is empty, and the data is not evenly distributed.
+  MongoDB fills one chunk before creating the next and maximizes the
+  number of chunks on a shard (8), before creating a new shard.
 
-(-∞, "Henri") 
-["Henri", "Peters")
-["Peters", "Smith"] 
-["Smith", "Tyler"]
-["Tyler",+∞)
+- Neither the collection nor the data are evenly distributed. MongoDB
+  might write to chunks unevenly.
 
-Monotonically Increasing Values
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- The sharded collection contains existing documents balanced over
+  multiple chunks *and* the inserted data is either low volume or evenly
+  distributed.
 
-.. what will happen if you try to do inserts.
 
-Documents with monotonically increasing shard keys, such as the BSON ObjectID, will always
-be inserted into the last chunk in a collection. To illustrate why, consider a sharded
-collection with two chunks, the second of which has an unbounded upper limit. 
+.. _sharding-monotonically-increasing-keys:
 
-(-∞, 100) 
-[100,+∞)
-
-If the data being inserted has an increasing key, at any given time writes will always hit
-the shard containing the chunk with the unbounded upper limit, a problem that is not
-alleviated by splitting the "hot" chunk. High volume inserts, therefore, could hinder the
-cluster's performance by placing a significant load on a single shard.
+Monotonically Increasing Values
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-If, however, a single shard can handle the write volume, an increasing shard key may have
-some advantages. For example, if you need to do queries based on document insertion time,
-sharding on the ObjectID ensures that documents created around the same time exist on the
-same shard. Data locality helps to improve query performance.
+.. what will happen if you try to do inserts.
 
-If you decide to use an monotonically increasing shard key and anticipate large inserts,
-one solution may be to store the hash of the shard key as a separate field. Hashing may
-prevent the need to balance chunks by distributing data equally around the cluster. You can
-create a hash client-side. In the future, MongoDB may support automatic hashing:
+Documents with monotonically increasing shard keys, such as the BSON
+ObjectID, will always be inserted into the last chunk in a collection.
+To illustrate why, consider a sharded collection with two chunks, the
+second of which has an unbounded upper limit.
+
+(-∞, 100)
+[100, +∞)
+
+If the data being inserted has an increasing key, at any given time
+writes will always hit the shard containing the chunk with the unbounded
+upper limit, a problem that is not alleviated by splitting the "hot"
+chunk. High volume inserts, therefore, could hinder the cluster's
+performance by placing a significant load on a single shard.
+
+If, however, a single shard can handle the write volume, an increasing
+shard key may have some advantages. For example, if you need to do
+queries based on document insertion time, sharding on the ObjectID
+ensures that documents created around the same time exist on the same
+shard. Data locality helps to improve query performance.
+
+If you decide to use an monotonically increasing shard key and
+anticipate large inserts, one solution may be to store the hash of the
+shard key as a separate field. Hashing may prevent the need to balance
+chunks by distributing data equally around the cluster. You can create a
+hash client-side. In the future, MongoDB may support automatic hashing:
 https://jira.mongodb.org/browse/SERVER-2001
 
 Operations
 ----------
 
-.. TODO 
+.. TODO
+
+.. outline the procedures and rationale for each process.
 
-   outline the procedures and rationale for each process.
+.. _sharding-pre-splitting:
 
 Pre-Splitting
 ~~~~~~~~~~~~~
@@ -86,42 +115,74 @@ Pre-Splitting
 .. when to do this
 .. procedure for this process
 
-Pre-splitting is the process of specifying shard key ranges for chunks prior to data
-insertion. This process may be important prior to large inserts into a sharded collection
-as a way of ensuring that the write operation is evenly spread around the cluster. You
-should consider pre-splitting before large inserts if the sharded collection is empty, if
-the collection's data or the data being inserted is not evenly distributed, or if the shard
-key is monotonically increasing.
+Pre-splitting is the process of specifying shard key ranges for chunks
+prior to data insertion. Pre-splitting ensures the write operation is
+evenly spread around the cluster. You should consider pre-splitting if:
+
+- You are doing high volume inserts.
 
-In the example below the pre-split command splits the chunk where the _id 99 would reside
-using that key as the split point. Note that a key need not exist for a chunk to use it in
-its range. The chunk may even be empty.
+- The sharded collection is empty.
 
-The first step is to create a sharded collection to contain the data, which can be done in
-three steps:
+- Either the collection's data or the data being inserted is not evenly distributed.
+
+- The shard key is monotonically increasing.
+
+As an example of pre-splitting, consider a collection sharded by last
+name with the following key distribution. The "(" and ")" symbols
+indicate a non-inclusive value. The "[" and "]" symbols indicate an
+inclusive value:
+
+["A", "Jones")
+["Jones", "Smith")
+["Smith", "zzz")
+
+Although the chunk ranges may be split evenly, inserting lots of users
+with with a common last name, such as "Jones" or "Smith", will
+potentially monopolize a single shard. Making the chunk range more
+granular in these portions of the alphabet may improve write
+performance.
+
+["A", "Jones")
+["Jones", "Peters")
+["Peters", "Smith")
+["Smith", "Tyler")
+["Tyler", "zzz"]
+
+Procedure
+---------
+
+In the example below the pre-split command splits the chunk where the
+_id 99 would reside using that key as the split point. Note that a key
+need not exist for a chunk to use it in its range. The chunk may even be
+empty.
+
+The first step is to create a sharded collection to contain the data,
+which can be done in three steps:
 
 > use admin
 > db.runCommand({ enableSharding : "foo" })
 
-Next, we add a unique index to the collection "foo.bar" which is required for the shard
-key.
+Next, we add a unique index to the collection "foo.bar" which is
+required for the shard key.
 
 > use foo
 > db.bar.ensureIndex({ _id : 1 }, { unique : true })
 
-Finally we shard the collection (which contains no data) using the _id value.
+Finally we shard the collection (which contains no data) using the _id
+value.
 
 > use admin
 switched to db admin
 > db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )
 
-Once the key range is specified, chunks can be moved around the cluster using the moveChunk
-command.
+Once the key range is specified, chunks can be moved around the cluster
+using the moveChunk command.
 
 > db.runCommand( { moveChunk : "test.foo" , find : { _id : 99 } , to : "shard1" } )
 
-You can repeat these steps as many times as necessary to create or move chunks around the
-cluster. To get information about the two chunks created in this example:
+You can repeat these steps as many times as necessary to create or move
+chunks around the cluster. To get information about the two chunks
+created in this example:
 
 > db.printShardingStatus()
 --- Sharding Status ---
@@ -130,32 +191,41 @@ cluster. To get information about the two chunks created in this example:
       { "_id" : "shard0000", "host" : "localhost:30000" }
       { "_id" : "shard0001", "host" : "localhost:30001" }
   databases:
-	{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
-	{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
-		test.foo chunks:
-				shard0001	1
-				shard0000	1
-			{ "_id" : { "$MinKey" : true } } -->> { "_id" : "99" } on : shard0001 { "t" : 2000, "i" : 1 }
-			{ "_id" : "99" } -->> { "_id" : { "$MaxKey" : true } } on : shard0000 { "t" : 2000, "i" : 0 }
+    { "_id" : "admin", "partitioned" : false, "primary" : "config" }
+    { "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
+        test.foo chunks:
+                shard0001    1
+                shard0000    1
+            { "_id" : { "$MinKey" : true } } -->> { "_id" : "99" } on : shard0001 { "t" : 2000, "i" : 1 }
+            { "_id" : "99" } -->> { "_id" : { "$MaxKey" : true } } on : shard0000 { "t" : 2000, "i" : 0 }
 
 Once the chunks and the key ranges are evenly distributed, you can proceed with a
 high volume insert.
 
-Changing Shard Key 
+.. _sharding-changing-shard-key:
+
+Changing Shard Key
 ~~~~~~~~~~~~~~~~~~
 
-There is no automatic support for changing the shard key for a collection. In addition,
-since a document's location within the cluster is determined by its shard key value,
-changing the shard key could force data to move from machine to machine, potentially a
-highly expensive operation.
+There is no automatic support for changing the shard key for a
+collection. In addition, since a document's location within the cluster
+is determined by its shard key value, changing the shard key could force
+data to move from machine to machine, potentially a highly expensive
+operation.
 
 Thus it is very important to choose the right shard key up front.
 
-If you do need to change a shard key, an export and import is likely the best solution.
-Create a new pre-sharded collection, and then import the exported data to it. If desired
-use a dedicated mongos for the export and the import.
+If you do need to change a shard key, an export and import is likely the
+best solution. Create a new pre-sharded collection, and then import the
+exported data to it. If desired use a dedicated mongos for the export
+and the import.
 
-https://jira.mongodb.org/browse/SERVER-4000
+.. :issue:`SERVER-4000`
+
+.. _sharding-pre-allocating-documents:
 
 Pre-allocating Documents
 ~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. http://docs.mongodb.org/manual/use-cases/pre-aggregated-reports/#pre-allocate
+

From 05ac055e19e048801aaccdb86a8a6048f288a5c2 Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Thu, 30 Aug 2012 13:43:35 -0400
Subject: [PATCH 3/4] DOCS-458 pre-split, ongoing edits

---
 ...ng-documents-into-a-sharded-collection.txt | 138 ++++++++++--------
 1 file changed, 80 insertions(+), 58 deletions(-)

diff --git a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
index 5caa0a26e95..b57fc3dc4e2 100644
--- a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
+++ b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
@@ -2,109 +2,131 @@
 Inserting Documents into a Sharded Collection
 =============================================
 
+Synopsis
+--------
+
+When inserting documents into a :term:`sharded <sharding>`
+:term:`collection`, you must consider how MongoDB will distribute the
+inserted data and how the distribution will affect performance. You must
+also consider whether you need first to :term:`pre-split
+<pre-splitting>` the data. This document describes types of distribution
+and how to pre-split data.
+
 Types of Distribution
 ---------------------
 
-MongoDB distributes inserted data in one of three ways, depending on a
-combination of :term:`shard keys <shard key>`, the distribution of
-existing :term:`chunks <chunk>`, and the distribution and volume of the
-inserted data. MongoDB distributes data on one of the following ways:
+MongoDB distributes inserted data in one of three ways, depending on
+the following factors: the distribution of the :term:`collection's <collection>`
+existing :term:`chunks <chunk>`, the existence of
+:term:`shard keys <shard key>` on the write operation (i.e., whether
+data is :term:`pre-split <pre-splitting>`), the distribution of the
+inserted data, and the volume of the inserted data.
+
+Depending on the above, MongoDB distributes inserted data in one of the
+following ways:
 
 - MongoDB distributes the data evenly around the cluster. For details
   see :ref:`sharding-even-distribution`.
 
-- MongoDB directs writes unevenly or directs all writes to a single
-  chunk. For details see :ref:`sharding-uneven-distribution`.
+- MongoDB distributes data unevenly around the cluster. For details see
+  :ref:`sharding-uneven-distribution`.
 
-- MongoDB inserts the data into the last chunk in the cluster. For
-  details see :ref:`sharding-monotonically-increasing-keys`.
+- MongoDB inserts all data into the last chunk in the cluster. For
+  details see :ref:`sharding-monotonic-distribution`.
 
 .. _sharding-even-distribution:
 
 Even Distribution
 ~~~~~~~~~~~~~~~~~
 
-MongoDB distributes writes evenly around the cluster in the following
-cases:
-
-- Data has been pre-split with evenly distributed :term:`shard keys <shard key>`,
-  which MongoDB then establishes and then uses to distribute writes
-  evenly. For details on pre-splitting data with evenly distributed
-  keys, see the :ref:`sharding-pre-splitting` section below.
+In even distribution, MongoDB balances writes among all :term:`chunks <chunk>`
+and :term:`shards` in the cluster. Even distribution provides the best
+performance and occurs in the following cases:
 
-- The sharded collection contains existing documents balanced over
-  multiple chunks *and* the inserted data is either low volume or evenly
-  distributed.
+- The insert data has been :term:`pre-split <pre-splitting>` by specifying
+  :term:`shard keys <shard key>` in the write operation. When you insert
+  the data, MongoDB uses the keys to distribute writes evenly. It does
+  not matter whether the existing :term:`collection` is evenly distributed. For
+  details on pre-splitting data, see the :ref:`sharding-pre-splitting`
+  section below.
 
-In even distributions, MongoDB automatically splits chunks when they
-grow to a certain size (~64 MB by default) and automatically balances
-chunks across shards, allowing no more than eight chunks on a shard.
+- The :term:`sharded <sharding>` collection contains existing documents
+  balanced over multiple chunks *and* the inserted data is either low
+  volume or already evenly distributed. If the data is large and
+  unevenly distributed, the write operation becomes imbalanced and
+  monopolizes certain chunks.
 
 .. _sharding-uneven-distribution:
 
 Uneven Distribution
 ~~~~~~~~~~~~~~~~~~~
 
-MongoDB writes data unevenly in the following cases. To avoid each of
-these cases, pre-split your data, as described the
-:ref:`sharding-pre-splitting` section below:
+In uneven distribution, MongoDB focuses write operations on a small
+number of :term:`chunk <chunks>` instead of balancing writes across
+chunks. This increases the likelihood that chunks will fill and that
+MongoDB must move chunks between :term:`shards`, an operation that slows
+performance.
+
+To avoid uneven distribution, :term:`pre-split` your data, as described
+in the :ref:`sharding-pre-splitting` section below.
 
-- You insert a large volume of data that isn't evenly distributed. In
-  this case, even if the sharded cluster contains existing documents
-  balanced over multiple chunks, the inserts might include values that
-  are contained only on a small number of chunks.
+Uneven distribution occurs in the following cases:
 
-- The collection is empty, and the data is not evenly distributed.
-  MongoDB fills one chunk before creating the next and maximizes the
-  number of chunks on a shard (8), before creating a new shard.
+- You insert a large volume of data that is not evenly distributed. Even
+  if the :term:`sharded cluster <shard cluster>` contains existing
+  documents balanced over multiple chunks, the inserted data might
+  include values that write disproportionately to a small number of
+  chunks.
 
-- Neither the collection nor the data are evenly distributed. MongoDB
-  might write to chunks unevenly.
+- The collection is empty, *and* the inserted data is not evenly
+  distributed. MongoDB fills one chunk before creating the next and
+  eventually must rebalance and move chunks between shards. Moving
+  chunks between shards affects performance.
 
-- The sharded collection contains existing documents balanced over
-  multiple chunks *and* the inserted data is either low volume or evenly
-  distributed.
+- Neither the collection nor the inserted data are evenly distributed.
+  MongoDB fills certain chunks too soon and eventually must rebalance
+  and move chunks between shards. Moving chunks between shards affects
+  performance.
 
+.. _sharding-monotonic-distribution:
 
-.. _sharding-monotonically-increasing-keys:
+All Inserts are to Last Chunk (Monotonic)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Monotonically Increasing Values
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When you insert documents with monotonically increasing
+:term:`shard keys <shard key>`, such as the BSON ObjectID, MongoDB
+inserts all data into the last :term:`chunk` in the collection.
 
-.. what will happen if you try to do inserts.
+For an example, consider a :term:`sharded <sharding>` collection with
+two chunks, the second of which has an unbounded upper limit. The "("
+and ")" symbols indicate a non-inclusive value. The "[" and "]" symbols
+indicate an inclusive value:
 
-Documents with monotonically increasing shard keys, such as the BSON
-ObjectID, will always be inserted into the last chunk in a collection.
-To illustrate why, consider a sharded collection with two chunks, the
-second of which has an unbounded upper limit.
+(-infinity, 100)
+[100, +infinity)
 
-(-∞, 100)
-[100, +∞)
+If the data being inserted has an increasing key, then writes always
+direct to the shard containing the chunk with the unbounded upper limit.
 
-If the data being inserted has an increasing key, at any given time
-writes will always hit the shard containing the chunk with the unbounded
-upper limit, a problem that is not alleviated by splitting the "hot"
-chunk. High volume inserts, therefore, could hinder the cluster's
-performance by placing a significant load on a single shard.
+Inserting to the last chunk can hinder the cluster's performance by placing a
+significant load on a single :term:`shard <shards>`.
 
 If, however, a single shard can handle the write volume, an increasing
 shard key may have some advantages. For example, if you need to do
-queries based on document insertion time, sharding on the ObjectID
+queries based on document-insertion time, sharding on the ObjectID
 ensures that documents created around the same time exist on the same
 shard. Data locality helps to improve query performance.
 
-If you decide to use an monotonically increasing shard key and
+If you decide to use a monotonically increasing shard key and
 anticipate large inserts, one solution may be to store the hash of the
 shard key as a separate field. Hashing may prevent the need to balance
 chunks by distributing data equally around the cluster. You can create a
-hash client-side. In the future, MongoDB may support automatic hashing:
-https://jira.mongodb.org/browse/SERVER-2001
+hash client-side.
 
 Operations
 ----------
 
 .. TODO
-
 .. outline the procedures and rationale for each process.
 
 .. _sharding-pre-splitting:
@@ -112,10 +134,10 @@ Operations
 Pre-Splitting
 ~~~~~~~~~~~~~
 
-.. when to do this
+.. TODO
 .. procedure for this process
 
-Pre-splitting is the process of specifying shard key ranges for chunks
+Pre-splitting is the process of specifying shard key ranges for :term:`chunks <chunk>`
 prior to data insertion. Pre-splitting ensures the write operation is
 evenly spread around the cluster. You should consider pre-splitting if:
 

From 183d102176d350c02328ca0973cf3a4cb240f08a Mon Sep 17 00:00:00 2001
From: Bob Grabar <bob.grabar@10gen.com>
Date: Thu, 30 Aug 2012 17:52:10 -0400
Subject: [PATCH 4/4] DOCS-458 draft pre-splitting document

---
 ...ng-documents-into-a-sharded-collection.txt | 158 +++++++++---------
 1 file changed, 83 insertions(+), 75 deletions(-)

diff --git a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
index b57fc3dc4e2..d620103f068 100644
--- a/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
+++ b/draft/tutorial/inserting-documents-into-a-sharded-collection.txt
@@ -12,6 +12,14 @@ also consider whether you need first to :term:`pre-split
 <pre-splitting>` the data. This document describes types of distribution
 and how to pre-split data.
 
+.. seealso::
+
+   - :ref:`chunk-management`
+   - :doc:`/core/sharding-internals`
+   - :doc:`/core/sharding`
+
+.. _sharding-distribution-types:
+
 Types of Distribution
 ---------------------
 
@@ -80,30 +88,31 @@ Uneven distribution occurs in the following cases:
 
 - The collection is empty, *and* the inserted data is not evenly
   distributed. MongoDB fills one chunk before creating the next and
-  eventually must rebalance and move chunks between shards. Moving
-  chunks between shards affects performance.
+  eventually must rebalance and move chunks between shards, slowing
+  performance.
 
 - Neither the collection nor the inserted data are evenly distributed.
   MongoDB fills certain chunks too soon and eventually must rebalance
-  and move chunks between shards. Moving chunks between shards affects
-  performance.
+  and move chunks between shards, slowing performance.
 
 .. _sharding-monotonic-distribution:
 
-All Inserts are to Last Chunk (Monotonic)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+All Writes Go to Last Chunk (Monotonic)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 When you insert documents with monotonically increasing
 :term:`shard keys <shard key>`, such as the BSON ObjectID, MongoDB
 inserts all data into the last :term:`chunk` in the collection.
 
-For an example, consider a :term:`sharded <sharding>` collection with
+For example, consider a :term:`sharded <sharding>` collection with
 two chunks, the second of which has an unbounded upper limit. The "("
 and ")" symbols indicate a non-inclusive value. The "[" and "]" symbols
 indicate an inclusive value:
 
-(-infinity, 100)
-[100, +infinity)
+.. code-block:: sh
+
+   (-infinity, 100)
+   [100, +infinity)
 
 If the data being inserted has an increasing key, then writes always
 direct to the shard containing the chunk with the unbounded upper limit.
@@ -123,103 +132,102 @@ shard key as a separate field. Hashing may prevent the need to balance
 chunks by distributing data equally around the cluster. You can create a
 hash client-side.
 
-Operations
-----------
-
-.. TODO
-.. outline the procedures and rationale for each process.
-
 .. _sharding-pre-splitting:
 
+Operation
+---------
+
 Pre-Splitting
 ~~~~~~~~~~~~~
 
-.. TODO
-.. procedure for this process
-
-Pre-splitting is the process of specifying shard key ranges for :term:`chunks <chunk>`
-prior to data insertion. Pre-splitting ensures the write operation is
-evenly spread around the cluster. You should consider pre-splitting if:
+Pre-splitting is the process of specifying :term:`shard key` ranges for
+:term:`chunks <chunk>` prior to data insertion in order to ensure
+MongoDB distributes the data evenly around the cluster. You should
+consider pre-splitting if:
 
 - You are doing high volume inserts.
 
-- The sharded collection is empty.
+- The :term:`sharded <sharding>` collection is empty.
 
 - Either the collection's data or the data being inserted is not evenly distributed.
 
 - The shard key is monotonically increasing.
 
-As an example of pre-splitting, consider a collection sharded by last
-name with the following key distribution. The "(" and ")" symbols
-indicate a non-inclusive value. The "[" and "]" symbols indicate an
-inclusive value:
+For more information on how MongoDB distributes inserted data, see
+:ref:`sharding-distribution-types`.
 
-["A", "Jones")
-["Jones", "Smith")
-["Smith", "zzz")
+As an example of when you might choose to pre-split, consider a
+collection sharded by last name with the following key distribution. The
+"(" and ")" symbols indicate a non-inclusive value. The "[" and "]"
+symbols indicate an inclusive value:
 
-Although the chunk ranges may be split evenly, inserting lots of users
-with with a common last name, such as "Jones" or "Smith", will
-potentially monopolize a single shard. Making the chunk range more
-granular in these portions of the alphabet may improve write
-performance.
+.. code-block:: sh
 
-["A", "Jones")
-["Jones", "Peters")
-["Peters", "Smith")
-["Smith", "Tyler")
-["Tyler", "zzz"]
+   ["A", "Jones")
+   ["Jones", "Smith")
+   ["Smith", "zzz")
 
-Procedure
----------
+Although the chunk ranges in the collection may be evenly split,
+inserting a large number of new users with a common last name, such as
+"Smith" or "Jones", will write disproportionately to a single shard,
+monopolizing writes and reads to that shard.
 
-In the example below the pre-split command splits the chunk where the
-_id 99 would reside using that key as the split point. Note that a key
-need not exist for a chunk to use it in its range. The chunk may even be
-empty.
+To avoid this situation, you would make the chunk range more granular:
 
-The first step is to create a sharded collection to contain the data,
-which can be done in three steps:
+.. code-block:: sh
 
-> use admin
-> db.runCommand({ enableSharding : "foo" })
+   ["A", "Jones")
+   ["Jones", "Parker")
+   ["Parker", "Smith")
+   ["Smith", "Tyler")
+   ["Tyler", "zzz"]
 
-Next, we add a unique index to the collection "foo.bar" which is
-required for the shard key.
+Procedures
+----------
 
-> use foo
-> db.bar.ensureIndex({ _id : 1 }, { unique : true })
+Pre-Splitting
+~~~~~~~~~~~~~
 
-Finally we shard the collection (which contains no data) using the _id
-value.
+This procedure assumes:
 
-> use admin
-switched to db admin
-> db.runCommand( { split : "test.foo" , middle : { _id : 99 } } )
+- You have a database "foo".
+
+- The "foo" database has a collection "bar".
+
+- The "bar" collection has a unique index.
+
+- The "bar" collection, which contains no data, is sharded on _id : 99.
+
+Note that a key need not exist for a chunk to use it in its range. The
+chunk may even be empty.
 
 Once the key range is specified, chunks can be moved around the cluster
 using the moveChunk command.
 
-> db.runCommand( { moveChunk : "test.foo" , find : { _id : 99 } , to : "shard1" } )
+.. code-block:: javascript
+
+   db.runCommand( { moveChunk : "test.foo" , find : { _id : 99 } , to : "shard1" } )
 
 You can repeat these steps as many times as necessary to create or move
 chunks around the cluster. To get information about the two chunks
 created in this example:
 
-> db.printShardingStatus()
---- Sharding Status ---
-  sharding version: { "_id" : 1, "version" : 3 }
-  shards:
-      { "_id" : "shard0000", "host" : "localhost:30000" }
-      { "_id" : "shard0001", "host" : "localhost:30001" }
-  databases:
-    { "_id" : "admin", "partitioned" : false, "primary" : "config" }
-    { "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
-        test.foo chunks:
-                shard0001    1
-                shard0000    1
-            { "_id" : { "$MinKey" : true } } -->> { "_id" : "99" } on : shard0001 { "t" : 2000, "i" : 1 }
-            { "_id" : "99" } -->> { "_id" : { "$MaxKey" : true } } on : shard0000 { "t" : 2000, "i" : 0 }
+.. code-block:: javascript
+
+   db.printShardingStatus()
+   --- Sharding Status ---
+     sharding version: { "_id" : 1, "version" : 3 }
+     shards:
+         { "_id" : "shard0000", "host" : "localhost:30000" }
+         { "_id" : "shard0001", "host" : "localhost:30001" }
+     databases:
+       { "_id" : "admin", "partitioned" : false, "primary" : "config" }
+       { "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
+           test.foo chunks:
+                   shard0001    1
+                   shard0000    1
+               { "_id" : { "$MinKey" : true } } -->> { "_id" : "99" } on : shard0001 { "t" : 2000, "i" : 1 }
+               { "_id" : "99" } -->> { "_id" : { "$MaxKey" : true } } on : shard0000 { "t" : 2000, "i" : 0 }
 
 Once the chunks and the key ranges are evenly distributed, you can proceed with a
 high volume insert.
@@ -239,8 +247,8 @@ Thus it is very important to choose the right shard key up front.
 
 If you do need to change a shard key, an export and import is likely the
 best solution. Create a new pre-sharded collection, and then import the
-exported data to it. If desired use a dedicated mongos for the export
-and the import.
+exported data to it. If desired use a dedicated :program:`mongos` for
+the export and the import.
 
 .. :issue:`SERVER-4000`