From d74deffab32752d364a5262833efa50208a02f13 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 23 Oct 2012 14:47:53 -0400 Subject: [PATCH 1/2] DOCS-249 write operations: first draft --- draft/core/write-operations.txt | 255 +++++++++++++++++++++++++--- source/applications/replication.txt | 8 +- 2 files changed, 236 insertions(+), 27 deletions(-) diff --git a/draft/core/write-operations.txt b/draft/core/write-operations.txt index 6058d6ca7f1..04b2b4fdacd 100644 --- a/draft/core/write-operations.txt +++ b/draft/core/write-operations.txt @@ -10,17 +10,15 @@ Synopsis Operations ---------- -The :doc:`/crud` section of this manual contains specific -documentation for the major classes of write operations for MongoDB -databases. Read the following pages for additional examples and -documentation: +The :doc:`/crud` section of this manual describes the major classes of +write operations for MongoDB databases: -:doc:`/applications/create` -:doc:`/applications/delete` -:doc:`/applications/update` +- :doc:`/applications/create` +- :doc:`/applications/update` +- :doc:`/applications/delete` -Also consider the following methods in the :program:`mongo` JavaScript -shell that allow you to write or change data in a MongoDB database. +The following methods in the :program:`mongo` JavaScript shell allow you +to write or change data in a MongoDB database. - :method:`db.collection.insert()` - :method:`db.collection.update()` @@ -29,37 +27,179 @@ shell that allow you to write or change data in a MongoDB database. - :method:`db.collection.remove()` - :method:`db.collection.delete()` -Consider the documentation for your client library or :doc:`driver +See the documentation for your client library or :doc:`driver ` for more information on how to access this functionality from within your application. -Write Concern and Write Safety ------------------------------- +.. index:: write concern +.. _write-operations-write-concern: -.. todo:: import and tweak section from the replica-set page. When we - publish this document we'll have to do a quick deletion/reduction - of the replica-set section, but during the editorial process the - content can be duplicated. +Write Concern +------------- + +.. todo add note about all drivers after `date` will have w:1 write concern for all operations by default. + +The :term:`write concern` option allows you to configure return +confirmation for some or all write operations. + +By default, when a :term:`client` sends a write operation to a database +server, MongoDB returns the operation without waiting for the operation +to complete and therefore without confirming the success of the +operation. + +To enable write concern and get return status on a write operation, use +the :dbcommand:`getLastError` command. + +By default, the command confirms that the :program:`mongod` instance +received the write operation and has committed the write +operation to the in-memory representation of the database. This provides +a simple and low-latency level of write concern and will allow your +application to detect situations where the :program:`mongod` instance +becomes inaccessible or insertion errors caused by :ref:`duplicate key +errors `. + +You can modify the level of write concern returned by the +:dbcommand:`getLastError` by issuing the command with one or both of +following options: + +- ``j`` or "journal" option. + + In addition to the default confirmation provided by + :dbcommand:`getLastError`, this option confirms that the + :program:`mongod` instance has written the data to the on-disk + journal. This ensures that the data is durable if :program:`mongod` or + the server itself crashes or shuts down unexpectedly. + +- ``w`` option. This applies only to :term:`replica sets `. + + This option confirms that the write operation has replicated to a + specified number of replica set members. You specify a specific number + of servers or specify ``majority`` to ensure that the write propagates + to a majority of set members. The following ensures the operation has + replicated to two members: + + .. code-block:: javascript + + db.runCommand( { getLastError: 1, w: 2 } ) + + The default value of ``w`` is ``1``. + + If you specify a ``w`` value greater than the number of available + non-:term:`arbiter` replica set members, the operation will block + until those members become available. This could cause the operation + to block forever. To specify a timeout threshold for the + :dbcommand:`getLastError` operation, use the ``wtimeout`` argument. + +Many drivers have a write concern that automatically issues +:dbcommand:`getLastError` after write operations to ensure the +operations complete. + +Write concern provides confirmation of write operations but can take +longer and are not required in all applications. Consider the following +operations: + +.. code-block:: javascript + + db.runCommand( { getLastError: 1, w: "majority" } ) + db.getLastErrorObj("majority") + +These equivalent :dbcommand:`getLastError` operations ensure that write +operations return only after a write operation has replicated to a +majority of the members of a replica set. + +You can configure default :dbcommand:`getLastError` behavior for a +replica set. Use the :data:`settings.getLastErrorDefaults` setting in +the :doc:`replica set configuration `. +For instance: + +.. code-block:: javascript + + cfg = rs.conf() + cfg.settings = {} + cfg.settings.getLastErrorDefaults = {w: "majority", j: true} + rs.reconfig(cfg) + +When the new configuration is active, the :dbcommand:`getLastError` +operation waits for the write operation to complete on a majority of the +set members before returning. Specifying ``j: true`` makes +:dbcommand:`getLastError` wait for a complete commit of the operations +to the journal before returning. + +The :data:`getLastErrorDefaults` setting only affects :dbcommand:`getLastError` +commands with *no* other arguments. + +.. note:: + + Use of inappropriate write concern can lead to :ref:`rollbacks + ` in the case of :ref:`replica set failover + `. Always ensure that your operations have + specified the required write concern for your application. + +For more information, see :ref:`replica-set-write-concern`. + +.. index:: read preference +.. index:: slaveOk +.. _write-operations-bulk-insert: Bulk Inserts ------------ -:issue:`SERVER-2395` +A bulk insert allows MongoDB to distribute the write performance penalty +when performing inserts to a large number of documents at once. Bulk +inserts let you pass multiple events to the :method:`insert()` method at +once. All write concern options apply to bulk inserts. + +If you insert data without write concern, the bulk insert gain might be +insignificant. But if you insert data with write concern configured, +bulk insert can bring significant performance gains by distributing the +penalty over the group of inserts. + +Bulk inserts are often used with :term:`sharded collections ` and are more effective when the collection is already +populated and MongoDB has already determined the key distribution. +Otherwise MongoDB needs time to learn and determine the distribution. + +If the collection is not populated, you can avoid the learning time by +predefining key ranges, as described in +:ref:`sharding-administration-pre-splitting`. + +When you perform bulk inserts, you can parallel import by sending +inserts to multiple :program:`mongos` instances. + +To distribute data *during* bulk inserts or if the cluster becomes +uneven, see :ref:`Migrating Chunks +`. -.. todo:: import the best content from: http://www.mongodb.org/display/DOCS/Bulk+Inserts sl - split between this section and the sharded clusters section. +If possible, consider using bulk inserts to insert event data. + +For more information see :ref:`write-operations-sharded-clusters` and +:doc:`/administration/import-export`. Indexing -------- -.. todo:: short section on the impact of indexes and index maintenance - on write operations. +After every insert, update, or delete operation, MongoDB updates not +only a collection but *every* index associated with the collection. +Therefore, every index on a collection adds some amount of +write-performance penalty. + +In general, the performance gains that indexes realize for read +operations are worth the insertion penalty. But if your application is +write-heavy, be careful when creating new indexes. + +For more information, see :doc:`/source/applications/indexes`. Isolation --------- -- atomicity -- :doc:`/tutorial/perform-two-phase-commits` +All operations inside of a MongoDB document are atomic. An update +operation may modify more than one document at more than one level +(nesting) in a single operation that will either succeed or fail and +cannot leave the document in an in-between state. + +For more information see :doc:`Isolated write operations +` and +:doc:`/tutorial/perform-two-phase-commits`. Architecture ------------ @@ -67,5 +207,74 @@ Architecture Replica Sets ~~~~~~~~~~~~ +If you are performing a large data ingestion or bulk load operation that +requires a large number of writes to the primary, the secondaries will +not be able to read the oplog fast enough to keep up with changes. +Setting some level of write concern can slow the overall progress of the +batch but will prevent the secondary from falling too far behind. + +To prevent this, use write concern so that MongoDB will perform a safe +write (i.e. call :dbcommand:`getLastError`) after every 100, 1,000, or +other designated number of operations. This provides an opportunity for +secondaries to catch up with the primary. Using safe writes, even in +batches, can impact write throughout; however, calling +:dbcommand:`getLastError` will prevents the secondaries from falling too +far behind the primary. + +For more information see :ref:`replica-set-write-concern`, +:ref:`replica-set-oplog-sizing`, :ref:`replica-set-oplog`, and +:ref:`replica-set-procedure-change-oplog-size`. + +.. _write-operations-sharded-clustsers: + Sharded Clusters ~~~~~~~~~~~~~~~~ + +In a :term:`sharded cluster`, a given write operation goes to a +particular :term:`shard` and :term:`chunk` in the cluster. Write +performance is affected by a number of factors, including the numbers of +writes and key ranges for the chunks. + +If you insert many documents in rapid succession, MongoDB initially +directs writes to a single chunk, which can affect performance. + +If your shard key monotonically increases and all inserts go the system +will adjust the metadata to keep balance, but at a given time ``t`` all +writes will go to a single shard, which is undesirable if insert rate is +extremely large. To avoid this, consider using a shard key that is not +increasing in value. For example in some cases you could reverse all the +bits of your shard key, which is information preserving yet then avoids +the increasing sequence of values. + +Note that BSON ObjectIds have this property. You might wish at +generation time to reverse the bits of the ObjectIds, or swap the first +and last 16 bit words, to "shuffle" the inserts. Alternatively you might +use UUIDs instead (but check that your UUID generator does not generate +increasing UUIDs consistently or you would get the same behavior). + +Shard key values that are strictly increasing are fine if the insert +volume is within the range that a single shard can process at a given +point in time. + +.. example:: The following example, in C++, swaps the leading and + trailing 16 bit word of object IDs generated so that they are no + longer monotonically increasing. + + .. code-block:: none + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + +For more information, see :doc:`/administration/sharding` and +:ref:`write-operations-bulk-insert`. diff --git a/source/applications/replication.txt b/source/applications/replication.txt index 4b549f1071c..1e87f89fa3b 100644 --- a/source/applications/replication.txt +++ b/source/applications/replication.txt @@ -20,8 +20,8 @@ This document describes those options and their implications. .. _write-concern: .. _replica-set-write-concern: -Write Concern -------------- +Write Concern for Replica Sets +------------------------------ When a :term:`client` sends a write operation to a database server, the operation returns without waiting for the operation to succeed or @@ -125,8 +125,8 @@ commands with *no* other arguments. .. _replica-set-read-preference: .. _slaveOk: -Read Preference ---------------- +Read Preference for Replica Sets +-------------------------------- Read preference describes how MongoDB clients route read operations to :term:`secondary` members of a :term:`replica set`. From 663815dd88c532323f6866f23b2ae4564c9ea1cd Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 23 Oct 2012 16:58:16 -0400 Subject: [PATCH 2/2] DOCS-249 write operations: minor edit --- draft/core/write-operations.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/draft/core/write-operations.txt b/draft/core/write-operations.txt index 04b2b4fdacd..7fb9951bf7f 100644 --- a/draft/core/write-operations.txt +++ b/draft/core/write-operations.txt @@ -246,7 +246,7 @@ increasing in value. For example in some cases you could reverse all the bits of your shard key, which is information preserving yet then avoids the increasing sequence of values. -Note that BSON ObjectIds have this property. You might wish at +Note that :term:`BSON` :term:`ObjectIds ` have this property. You might wish at generation time to reverse the bits of the ObjectIds, or swap the first and last 16 bit words, to "shuffle" the inserts. Alternatively you might use UUIDs instead (but check that your UUID generator does not generate