diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index 2a940f627d1..3c58c0db013 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -125,7 +125,7 @@ more detail or use the following procedure as a quick starting point: replicaSetName/,, For example, if the name of the replica set is "``repl0``", then - your :func:`sh.addShard` command would be: + your :method:`sh.addShard` command would be: .. code-block:: javascript @@ -136,7 +136,7 @@ more detail or use the following procedure as a quick starting point: MongoDB enables sharding on a per-database basis. This is only a meta-data change and will not redistribute your data. To enable sharding for a given database, use the :dbcommand:`enableSharding` - command or the :func:`sh.enableSharding()` shell function. + command or the :method:`sh.enableSharding()` shell function. .. code-block:: javascript @@ -165,7 +165,7 @@ more detail or use the following procedure as a quick starting point: collections must belong to a database for which you have enabled sharding. When you shard a collection, you also choose the shard key. To shard a collection, run the :dbcommand:`shardCollection` - command or the :func:`sh.shardCollection()` shell helper. + command or the :method:`sh.shardCollection()` shell helper. .. code-block:: javascript @@ -223,7 +223,7 @@ procedure: #. First, you need to tell the cluster where to find the individual shards. You can do this using the :dbcommand:`addShard` command or - the :func:`sh.addShard()` helper: + the :method:`sh.addShard()` helper: .. code-block:: javascript @@ -266,7 +266,7 @@ procedure: following form: the replica set name, followed by a forward slash, followed by a comma-separated list of seeds for the replica set. For example, if the name of the replica set is - "myapp1", then your :func:`sh.addShard` command might resemble: + "myapp1", then your :method:`sh.addShard` command might resemble: .. code-block:: javascript @@ -312,7 +312,7 @@ The procedure to remove a shard is as follows: shard name when you first ran the :dbcommand:`addShard` command. If not, you can find out the name of the shard by running the :dbcommand:`listshards` or :dbcommand:`printShardingStatus` - commands or the :func:`sh.status()` shell helper. + commands or the :method:`sh.status()` shell helper. The following examples will remove a shard named ``mongodb0`` from the cluster. @@ -404,41 +404,51 @@ stop the processes comprising the ``mongodb0`` shard. Chunk Management ---------------- -This section describes various operations on -:term:`chunks ` in :term:`shard clusters `. In -most cases MongoDB automates these processes; however, in some cases, -particularly when you're setting up a shard cluster, you may -need to create and manipulate chunks directly. +This section describes various operations on :term:`chunks ` in +:term:`shard clusters `. MongoDB automates these +processes; however, in some cases, particularly when you're setting up +a shard cluster, you may need to create and manipulate chunks +directly. .. _sharding-procedure-create-split: Splitting Chunks ~~~~~~~~~~~~~~~~ -Normally, MongoDB splits a :term:`chunk` following inserts or updates -when a chunk exceeds the :ref:`chunk size `. +Normally, MongoDB splits a :term:`chunk` following inserts when a +chunk exceeds the :ref:`chunk size `. Recently +split chunks may be moved immediately to a new shard if +:program:`mongos` predicts future insertions will benefit from the +move. + +The MongoDB treats all chunks the same, whether split manually or +automatically by the system. + +.. warning:: + + You cannot merge or combine chunks once you have split them. You may want to split chunks manually if: -- you have a large amount of data in your cluster that is *not* split, - as is the case after creating a shard cluster with existing data. +- you have a large amount of data in your cluster and very few + :term:`chunks `, + as is the case after creating a shard cluster from existing data. - you expect to add a large amount of data that would initially reside in a single chunk or shard. .. example:: - You plan to insert a large amount of data as the result of an - import process with :term:`shard key` values between ``300`` and - ``400``, *but* all values of your shard key between ``250`` and - ``500`` are within a single chunk. + You plan to insert a large amount of data with :term:`shard key` + values between ``300`` and ``400``, *but* all values of your shard + keys are between ``250`` and ``500`` are in a single chunk. -Use :func:`sh.status()` to determine the current chunks ranges across -the cluster. +To determine the current chunk ranges across the cluster, use +:method:`sh.status()` or :method:`db.printShardingStatus()`. -To split chunks manually, use either the :func:`sh.splitAt()` or -:func:`sh.splitFind()` helpers in the :program:`mongo` shell. -These helpers wrap the :dbcommand:`split` command. +To split chunks manually, use the :dbcommand:`split` command with +operators: ``middle`` and ``find``. The equivalent shell helpers are +:method:`sh.splitAt()` or :method:`sh.splitFind()`. .. example:: @@ -449,29 +459,97 @@ These helpers wrap the :dbcommand:`split` command. sh.splitFind( { "zipcode": 63109 } ) -:func:`sh.splitFind()` will split the chunk that contains the *first* document returned -that matches this query into two equal components. MongoDB will split -the chunk so that documents that have half of the shard keys in will -be in one chunk and the documents that have other half of the shard -keys will be a second chunk. The query in :func:`sh.splitFind()` need -not contain the shard key, though it almost always makes sense to -query for the shard key in this case, and including the shard key will -expedite the operation. +:method:`sh.splitFind()` will split the chunk containing the queried +document into two equal sized chunks, dividing the chunk using the +balancer algorithm. The :method:`sh.splitFind()` query may not be based +on the shard key, but it makes sense to use the shard key, and +including the shard key will expedite the operation. + +Use :method:`sh.splitAt()` to split a chunk in two using the queried +document as the partition point: + +.. code-block:: javascript + + sh.splitAt( { "zipcode": 63109 } ) However, the location of the document that this query finds with respect to the other documents in the chunk does not affect how the chunk splits. -Use :func:`sh.splitAt()` to split a chunk in two using the queried -document as the partition point: +Pre-Split +~~~~~~~~~ -.. code-block:: javascript +Splitting chunks will improve shard cluster performance when importing +data such as: - sh.splitAt( { "zipcode": 63109 } ) +- migrating data from another system to a shard cluster -.. warning:: +- performing a full shard cluster restore from back up - You cannot merge or combine chunks once you have split them. +In such cases, data import to an empty MongoDB shard cluster can +become slower than to a replica set. The reason for this is how +MongoDB writes data and manage chunks in a shard cluster. + +MongoDB inserts data into a chunk until it becomes large and splits on +the same shard. If the balancer notices a chunk imbalance between +shards, a migration process will begin to evenly distribute chunks. + +Migrating chunks between shards is extremely resource intensive as +shard members must notify, copy, update, and delete chunks between +each other and the configuration database. With a high volume import, +this migration process can reduce system performance so that imports +cannot occur because of migrations. + +To improve import performance, manually split and migrate chunks in an +empty shard cluster beforehand. This allows the shard cluster to only +write import data instead of trying to manage migrations and write +data. + +To prepare your shard cluster for data import, split and migrate +empty chunks. + +#. Split empty chunks in your collection by manually performing + :dbcommand:`split` command on chunks. + + .. example:: + + To pre-split chunks for 100 million user profiles sharded by + email address for 5 shards, run the following commands in the + mongo shell: + + .. code-block:: javascript + + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); + } + } + +#. Move chunks to different shard manually using the + :dbcommand:`moveChunk` command. + + .. example:: + + To migrate all the chunks created for 100 million user profiles + evenly, putting each prefix chunk on the next shard from the + other , run the following commands in the mongo shell: + + .. code-block:: javascript + + var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) + } + } + + Optionally, you can also let the balancer automatically + redistribute chunks in your shard cluster. + +When empty chunks are distributed, data import can occur with multiple +:program:`mongos` instances, improving overall import performance. .. _sharding-balancing-modify-chunk-size: @@ -496,7 +574,7 @@ To modify the chunk size, use the following procedure: use config -#. Issue the following :func:`save() ` +#. Issue the following :method:`save() ` operation: .. code-block:: javascript @@ -640,7 +718,7 @@ Here's what this tells you: .. note:: - Use the :func:`sh.isBalancerRunning()` helper in the + Use the :method:`sh.isBalancerRunning()` helper in the :program:`mongo` shell to determine if the balancer is running, as follows: @@ -668,7 +746,7 @@ be able to migrate chunks: use config -#. Use an operation modeled on the following example :func:`update() +#. Use an operation modeled on the following example :method:`update() ` operation to modify the balancer's window: @@ -722,10 +800,10 @@ all migration, use the following procedure: If a balancing round is in progress, the system will complete the current round before the balancer is officially disabled. After - disabling, you can use the :func:`sh.getBalancerState()` shell + disabling, you can use the :method:`sh.getBalancerState()` shell function to determine whether the balancer is in fact disabled. -The above process and the :func:`sh.setBalancerState()` helper provide a +The above process and the :method:`sh.setBalancerState()` helper provide a wrapper on the following process, which may be useful if you need to run this operation from a driver that does not have helper functions: @@ -1090,6 +1168,6 @@ operation: for the duration of the backup procedure. Confirm that the balancer is not active using the -:func:`sh.getBalancerState()` method before starting a backup +:method:`sh.getBalancerState()` method before starting a backup operation. When the backup procedure is complete you can reactivate the balancer process.