mongodb · tychoish · Aug 30, 2012 · Aug 9, 2012 · Aug 20, 2012 · Aug 20, 2012
diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt
@@ -125,7 +125,7 @@ more detail or use the following procedure as a quick starting point:
          replicaSetName/<seed1>,<seed2>,<seed3>
 
       For example, if the name of the replica set is "``repl0``", then
-      your :func:`sh.addShard` command would be:
+      your :method:`sh.addShard` command would be:
 
       .. code-block:: javascript
 
@@ -136,7 +136,7 @@ more detail or use the following procedure as a quick starting point:
    MongoDB enables sharding on a per-database basis. This is only a
    meta-data change and will not redistribute your data. To enable
    sharding for a given database, use the :dbcommand:`enableSharding`
-   command or the :func:`sh.enableSharding()` shell function.
+   command or the :method:`sh.enableSharding()` shell function.
 
    .. code-block:: javascript
 
@@ -165,7 +165,7 @@ more detail or use the following procedure as a quick starting point:
    collections must belong to a database for which you have enabled
    sharding. When you shard a collection, you also choose the shard
    key. To shard a collection, run the :dbcommand:`shardCollection`
-   command or the :func:`sh.shardCollection()` shell helper.
+   command or the :method:`sh.shardCollection()` shell helper.
 
    .. code-block:: javascript
 
@@ -223,7 +223,7 @@ procedure:
 
 #. First, you need to tell the cluster where to find the individual
    shards. You can do this using the :dbcommand:`addShard` command or
-   the :func:`sh.addShard()` helper:
+   the :method:`sh.addShard()` helper:
 
    .. code-block:: javascript
 
@@ -266,7 +266,7 @@ procedure:
       following form: the replica set name, followed by a forward
       slash, followed by a comma-separated list of seeds for the
       replica set. For example, if the name of the replica set is
-      "myapp1", then your :func:`sh.addShard` command might resemble:
+      "myapp1", then your :method:`sh.addShard` command might resemble:
 
       .. code-block:: javascript
 
@@ -312,7 +312,7 @@ The procedure to remove a shard is as follows:
    shard name when you first ran the :dbcommand:`addShard` command. If not,
    you can find out the name of the shard by running the
    :dbcommand:`listshards` or :dbcommand:`printShardingStatus`
-   commands or the :func:`sh.status()` shell helper.
+   commands or the :method:`sh.status()` shell helper.
 
    The following examples will remove a shard named ``mongodb0`` from the cluster.
 
@@ -404,41 +404,51 @@ stop the processes comprising the ``mongodb0`` shard.
 Chunk Management
 ----------------
 
-This section describes various operations on
-:term:`chunks <chunk>` in :term:`shard clusters <shard cluster>`. In
-most cases MongoDB automates these processes; however, in some cases,
-particularly when you're setting up a shard cluster, you may
-need to create and manipulate chunks directly.
+This section describes various operations on :term:`chunks <chunk>` in
+:term:`shard clusters <shard cluster>`. MongoDB automates these
+processes; however, in some cases, particularly when you're setting up
+a shard cluster, you may need to create and manipulate chunks
+directly.
 
 .. _sharding-procedure-create-split:
 
 Splitting Chunks
 ~~~~~~~~~~~~~~~~
 
-Normally, MongoDB splits a :term:`chunk` following inserts or updates
-when a chunk exceeds the :ref:`chunk size <sharding-chunk-size>`.
+Normally, MongoDB splits a :term:`chunk` following inserts when a
+chunk exceeds the :ref:`chunk size <sharding-chunk-size>`. Recently
+split chunks may be moved immediately to a new shard if
+:program:`mongos` predicts future insertions will benefit from the
+move.
+
+The MongoDB treats all chunks the same, whether split manually or
+automatically by the system.
+
+.. warning::
+
+   You cannot merge or combine chunks once you have split them.
 
 You may want to split chunks manually if:
 
-- you have a large amount of data in your cluster that is *not* split,
-  as is the case after creating a shard cluster with existing data.
+- you have a large amount of data in your cluster and very few
+  :term:`chunks <chunk>`,
+  as is the case after creating a shard cluster from existing data.
 
 - you expect to add a large amount of data that would
   initially reside in a single chunk or shard.
 
 .. example::
 
-   You plan to insert a large amount of data as the result of an
-   import process with :term:`shard key` values between ``300`` and
-   ``400``, *but* all values of your shard key between ``250`` and
-   ``500`` are within a single chunk.
+   You plan to insert a large amount of data with :term:`shard key`
+   values between ``300`` and ``400``, *but* all values of your shard
+   keys are between ``250`` and ``500`` are in a single chunk.
 
-Use :func:`sh.status()` to determine the current chunks ranges across
-the cluster.
+To determine the current chunk ranges across the cluster, use
+:method:`sh.status()` or :method:`db.printShardingStatus()`. 
 
-To split chunks manually, use either the :func:`sh.splitAt()` or
-:func:`sh.splitFind()` helpers in the :program:`mongo` shell.
-These helpers wrap the :dbcommand:`split` command.
+To split chunks manually, use the :dbcommand:`split` command with
+operators: ``middle`` and ``find``. The equivalent shell helpers are
+:method:`sh.splitAt()` or :method:`sh.splitFind()`.
 
 .. example::
 
@@ -449,29 +459,97 @@ These helpers wrap the :dbcommand:`split` command.
 
       sh.splitFind( { "zipcode": 63109 } )
 
-:func:`sh.splitFind()` will split the chunk that contains the *first* document returned
-that matches this query into two equal components. MongoDB will split
-the chunk so that documents that have half of the shard keys in will
-be in one chunk and the documents that have other half of the shard
-keys will be a second chunk. The query in :func:`sh.splitFind()` need
-not contain the shard key, though it almost always makes sense to
-query for the shard key in this case, and including the shard key will
-expedite the operation.
+:method:`sh.splitFind()` will split the chunk containing the queried
+document into two equal sized chunks, dividing the chunk using the
+balancer algorithm. The :method:`sh.splitFind()` query may not be based
+on the shard key, but it makes sense to use the shard key, and
+including the shard key will expedite the operation.
+
+Use :method:`sh.splitAt()` to split a chunk in two using the queried
+document as the partition point:
+
+.. code-block:: javascript
+
+   sh.splitAt( { "zipcode": 63109 } )
 
 However, the location of the document that this query finds with
 respect to the other documents in the chunk does not affect how the
 chunk splits.
 
-Use :func:`sh.splitAt()` to split a chunk in two using the queried
-document as the partition point:
+Pre-Split
+~~~~~~~~~
 
-.. code-block:: javascript
+Splitting chunks will improve shard cluster performance when importing
+data such as:
 
-   sh.splitAt( { "zipcode": 63109 } )
+- migrating data from another system to a shard cluster
 
-.. warning::
+- performing a full shard cluster restore from back up
 
-   You cannot merge or combine chunks once you have split them.
+In such cases, data import to an empty MongoDB shard cluster can
+become slower than to a replica set. The reason for this is how
+MongoDB writes data and manage chunks in a shard cluster.
+
+MongoDB inserts data into a chunk until it becomes large and splits on
+the same shard. If the balancer notices a chunk imbalance between
+shards, a migration process will begin to evenly distribute chunks.
+
+Migrating chunks between shards is extremely resource intensive as
+shard members must notify, copy, update, and delete chunks between
+each other and the configuration database. With a high volume import,
+this migration process can reduce system performance so that imports
+cannot occur because of migrations.
+
+To improve import performance, manually split and migrate chunks in an
+empty shard cluster beforehand. This allows the shard cluster to only
+write import data instead of trying to manage migrations and write
+data.
+
+To prepare your shard cluster for data import, split and migrate
+empty chunks.
+
+#. Split empty chunks in your collection by manually performing
+   :dbcommand:`split` command on chunks.
+
+   .. example::
+
+      To pre-split chunks for 100 million user profiles sharded by
+      email address for 5 shards, run the following commands in the
+      mongo shell:
+
+        .. code-block:: javascript
+
+           for ( var x=97; x<97+26; x++ ){
+             for( var y=97; y<97+26; y+=6 ) {
+               var prefix = String.fromCharCode(x) + String.fromCharCode(y);
+               db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );
+             }
+           }
+
+#. Move chunks to different shard manually using the
+   :dbcommand:`moveChunk` command.
+
+   .. example::
+
+      To migrate all the chunks created for 100 million user profiles
+      evenly, putting each prefix chunk on the next shard from the
+      other , run the following commands in the mongo shell:
+
+        .. code-block:: javascript
+
+	   var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ];
+           for ( var x=97; x<97+26; x++ ){
+             for( var y=97; y<97+26; y+=6 ) {
+               var prefix = String.fromCharCode(x) + String.fromCharCode(y);
+               db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]})
+	     }
+	   }
+
+   Optionally, you can also let the balancer automatically
+   redistribute chunks in your shard cluster.
+
+When empty chunks are distributed, data import can occur with multiple
+:program:`mongos` instances, improving overall import performance.
 
 .. _sharding-balancing-modify-chunk-size:
 
@@ -496,7 +574,7 @@ To modify the chunk size, use the following procedure:
 
       use config
 
-#. Issue the following :func:`save() <db.collection.save()>`
+#. Issue the following :method:`save() <db.collection.save()>`
    operation:
 
    .. code-block:: javascript
@@ -640,7 +718,7 @@ Here's what this tells you:
 
   .. note::
 
-     Use the :func:`sh.isBalancerRunning()` helper in the
+     Use the :method:`sh.isBalancerRunning()` helper in the
      :program:`mongo` shell to determine if the balancer is
      running, as follows:
 
@@ -668,7 +746,7 @@ be able to migrate chunks:
 
       use config
 
-#. Use an operation modeled on the following example :func:`update()
+#. Use an operation modeled on the following example :method:`update()
    <db.collection.update()>` operation to modify the balancer's
    window:
 
@@ -722,10 +800,10 @@ all migration, use the following procedure:
 
    If a balancing round is in progress, the system will complete the
    current round before the balancer is officially disabled.  After
-   disabling, you can use the :func:`sh.getBalancerState()` shell
+   disabling, you can use the :method:`sh.getBalancerState()` shell
    function to determine whether the balancer is in fact disabled.
 
-The above process and the :func:`sh.setBalancerState()` helper provide a
+The above process and the :method:`sh.setBalancerState()` helper provide a
 wrapper on the following process, which may be useful if you need to
 run this operation from a driver that does not have helper functions:
 
@@ -1090,6 +1168,6 @@ operation:
   for the duration of the backup procedure.
 
 Confirm that the balancer is not active using the
-:func:`sh.getBalancerState()` method before starting a backup
+:method:`sh.getBalancerState()` method before starting a backup
 operation. When the backup procedure is complete you can reactivate
 the balancer process.