Skip to content

Migrate splitting chunks - DOCS-304 #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 30, 2012
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 122 additions & 44 deletions source/administration/sharding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ more detail or use the following procedure as a quick starting point:
replicaSetName/<seed1>,<seed2>,<seed3>

For example, if the name of the replica set is "``repl0``", then
your :func:`sh.addShard` command would be:
your :method:`sh.addShard` command would be:

.. code-block:: javascript

Expand All @@ -136,7 +136,7 @@ more detail or use the following procedure as a quick starting point:
MongoDB enables sharding on a per-database basis. This is only a
meta-data change and will not redistribute your data. To enable
sharding for a given database, use the :dbcommand:`enableSharding`
command or the :func:`sh.enableSharding()` shell function.
command or the :method:`sh.enableSharding()` shell function.

.. code-block:: javascript

Expand Down Expand Up @@ -165,7 +165,7 @@ more detail or use the following procedure as a quick starting point:
collections must belong to a database for which you have enabled
sharding. When you shard a collection, you also choose the shard
key. To shard a collection, run the :dbcommand:`shardCollection`
command or the :func:`sh.shardCollection()` shell helper.
command or the :method:`sh.shardCollection()` shell helper.

.. code-block:: javascript

Expand Down Expand Up @@ -223,7 +223,7 @@ procedure:

#. First, you need to tell the cluster where to find the individual
shards. You can do this using the :dbcommand:`addShard` command or
the :func:`sh.addShard()` helper:
the :method:`sh.addShard()` helper:

.. code-block:: javascript

Expand Down Expand Up @@ -266,7 +266,7 @@ procedure:
following form: the replica set name, followed by a forward
slash, followed by a comma-separated list of seeds for the
replica set. For example, if the name of the replica set is
"myapp1", then your :func:`sh.addShard` command might resemble:
"myapp1", then your :method:`sh.addShard` command might resemble:

.. code-block:: javascript

Expand Down Expand Up @@ -312,7 +312,7 @@ The procedure to remove a shard is as follows:
shard name when you first ran the :dbcommand:`addShard` command. If not,
you can find out the name of the shard by running the
:dbcommand:`listshards` or :dbcommand:`printShardingStatus`
commands or the :func:`sh.status()` shell helper.
commands or the :method:`sh.status()` shell helper.

The following examples will remove a shard named ``mongodb0`` from the cluster.

Expand Down Expand Up @@ -404,41 +404,51 @@ stop the processes comprising the ``mongodb0`` shard.
Chunk Management
----------------

This section describes various operations on
:term:`chunks <chunk>` in :term:`shard clusters <shard cluster>`. In
most cases MongoDB automates these processes; however, in some cases,
particularly when you're setting up a shard cluster, you may
need to create and manipulate chunks directly.
This section describes various operations on :term:`chunks <chunk>` in
:term:`shard clusters <shard cluster>`. MongoDB automates these
processes; however, in some cases, particularly when you're setting up
a shard cluster, you may need to create and manipulate chunks
directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-flowing these paragraphs makes reviewing more difficult.

avoid doing this unless you making a significant substantive change, otherwise it wastes time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd line edited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I see now.

I think asserting:

  1. That this document addresses one topic.
  2. MongoDB automates these tasks.

Is rhetorically confusing. The original isn't perfect by any means, but let's revert unless there's another good solution given that this is orthogonal to the project of getting the page redirected.


.. _sharding-procedure-create-split:

Splitting Chunks
~~~~~~~~~~~~~~~~

Normally, MongoDB splits a :term:`chunk` following inserts or updates
when a chunk exceeds the :ref:`chunk size <sharding-chunk-size>`.
Normally, MongoDB splits a :term:`chunk` following inserts when a
chunk exceeds the :ref:`chunk size <sharding-chunk-size>`. Recently
split chunks may be moved immediately to a new shard if
:program:`mongos` predicts future insertions will benefit from the
move.

The MongoDB treats all chunks the same, whether split manually or
automatically by the system.

.. warning::

You cannot merge or combine chunks once you have split them.

You may want to split chunks manually if:

- you have a large amount of data in your cluster that is *not* split,
as is the case after creating a shard cluster with existing data.
- you have a large amount of data in your cluster and very few
:term:`chunks <chunk>`,
as is the case after creating a shard cluster from existing data.

- you expect to add a large amount of data that would
initially reside in a single chunk or shard.

.. example::

You plan to insert a large amount of data as the result of an
import process with :term:`shard key` values between ``300`` and
``400``, *but* all values of your shard key between ``250`` and
``500`` are within a single chunk.
You plan to insert a large amount of data with :term:`shard key`
values between ``300`` and ``400``, *but* all values of your shard
keys are between ``250`` and ``500`` are in a single chunk.

Use :func:`sh.status()` to determine the current chunks ranges across
the cluster.
To determine the current chunk ranges across the cluster, use
:method:`sh.status()` or :method:`db.printShardingStatus()`.

To split chunks manually, use either the :func:`sh.splitAt()` or
:func:`sh.splitFind()` helpers in the :program:`mongo` shell.
These helpers wrap the :dbcommand:`split` command.
To split chunks manually, use the :dbcommand:`split` command with
operators: ``middle`` and ``find``. The equivalent shell helpers are
:method:`sh.splitAt()` or :method:`sh.splitFind()`.

.. example::

Expand All @@ -449,29 +459,97 @@ These helpers wrap the :dbcommand:`split` command.

sh.splitFind( { "zipcode": 63109 } )

:func:`sh.splitFind()` will split the chunk that contains the *first* document returned
that matches this query into two equal components. MongoDB will split
the chunk so that documents that have half of the shard keys in will
be in one chunk and the documents that have other half of the shard
keys will be a second chunk. The query in :func:`sh.splitFind()` need
not contain the shard key, though it almost always makes sense to
query for the shard key in this case, and including the shard key will
expedite the operation.
:method:`sh.splitFind()` will split the chunk containing the queried
document into two equal sized chunks, dividing the chunk using the
balancer algorithm. The :method:`sh.splitFind()` query may not be based
on the shard key, but it makes sense to use the shard key, and
including the shard key will expedite the operation.

Use :method:`sh.splitAt()` to split a chunk in two using the queried
document as the partition point:

.. code-block:: javascript

sh.splitAt( { "zipcode": 63109 } )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can change the form of this admonition, but I think it's a crucial point, and it needs to remain in the document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still in the document. this line has been moved up in the document to line 427.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. sorry.

However, the location of the document that this query finds with
respect to the other documents in the chunk does not affect how the
chunk splits.

Use :func:`sh.splitAt()` to split a chunk in two using the queried
document as the partition point:
Pre-Split
~~~~~~~~~

.. code-block:: javascript
Splitting chunks will improve shard cluster performance when importing
data such as:

sh.splitAt( { "zipcode": 63109 } )
- migrating data from another system to a shard cluster

.. warning::
- performing a full shard cluster restore from back up

You cannot merge or combine chunks once you have split them.
In such cases, data import to an empty MongoDB shard cluster can
become slower than to a replica set. The reason for this is how
MongoDB writes data and manage chunks in a shard cluster.

MongoDB inserts data into a chunk until it becomes large and splits on
the same shard. If the balancer notices a chunk imbalance between
shards, a migration process will begin to evenly distribute chunks.

Migrating chunks between shards is extremely resource intensive as
shard members must notify, copy, update, and delete chunks between
each other and the configuration database. With a high volume import,
this migration process can reduce system performance so that imports
cannot occur because of migrations.

To improve import performance, manually split and migrate chunks in an
empty shard cluster beforehand. This allows the shard cluster to only
write import data instead of trying to manage migrations and write
data.

To prepare your shard cluster for data import, split and migrate
empty chunks.

#. Split empty chunks in your collection by manually performing
:dbcommand:`split` command on chunks.

.. example::

To pre-split chunks for 100 million user profiles sharded by
email address for 5 shards, run the following commands in the
mongo shell:

.. code-block:: javascript

for ( var x=97; x<97+26; x++ ){
for( var y=97; y<97+26; y+=6 ) {
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
db.runCommand( { split : "myapp.users" , middle : { email : prefix } } );
}
}

#. Move chunks to different shard manually using the
:dbcommand:`moveChunk` command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax error.

.. example::

To migrate all the chunks created for 100 million user profiles
evenly, putting each prefix chunk on the next shard from the
other , run the following commands in the mongo shell:

.. code-block:: javascript

var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ];
for ( var x=97; x<97+26; x++ ){
for( var y=97; y<97+26; y+=6 ) {
var prefix = String.fromCharCode(x) + String.fromCharCode(y);
db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]})
}
}

Optionally, you can also let the balancer automatically
redistribute chunks in your shard cluster.

When empty chunks are distributed, data import can occur with multiple
:program:`mongos` instances, improving overall import performance.

.. _sharding-balancing-modify-chunk-size:

Expand All @@ -496,7 +574,7 @@ To modify the chunk size, use the following procedure:

use config

#. Issue the following :func:`save() <db.collection.save()>`
#. Issue the following :method:`save() <db.collection.save()>`
operation:

.. code-block:: javascript
Expand Down Expand Up @@ -640,7 +718,7 @@ Here's what this tells you:

.. note::

Use the :func:`sh.isBalancerRunning()` helper in the
Use the :method:`sh.isBalancerRunning()` helper in the
:program:`mongo` shell to determine if the balancer is
running, as follows:

Expand Down Expand Up @@ -668,7 +746,7 @@ be able to migrate chunks:

use config

#. Use an operation modeled on the following example :func:`update()
#. Use an operation modeled on the following example :method:`update()
<db.collection.update()>` operation to modify the balancer's
window:

Expand Down Expand Up @@ -722,10 +800,10 @@ all migration, use the following procedure:

If a balancing round is in progress, the system will complete the
current round before the balancer is officially disabled. After
disabling, you can use the :func:`sh.getBalancerState()` shell
disabling, you can use the :method:`sh.getBalancerState()` shell
function to determine whether the balancer is in fact disabled.

The above process and the :func:`sh.setBalancerState()` helper provide a
The above process and the :method:`sh.setBalancerState()` helper provide a
wrapper on the following process, which may be useful if you need to
run this operation from a driver that does not have helper functions:

Expand Down Expand Up @@ -1090,6 +1168,6 @@ operation:
for the duration of the backup procedure.

Confirm that the balancer is not active using the
:func:`sh.getBalancerState()` method before starting a backup
:method:`sh.getBalancerState()` method before starting a backup
operation. When the backup procedure is complete you can reactivate
the balancer process.