Skip to content

DOCS-249 write operations: first draft #345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
255 changes: 232 additions & 23 deletions draft/core/write-operations.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,15 @@ Synopsis
Operations
----------

The :doc:`/crud` section of this manual contains specific
documentation for the major classes of write operations for MongoDB
databases. Read the following pages for additional examples and
documentation:
The :doc:`/crud` section of this manual describes the major classes of
write operations for MongoDB databases:

:doc:`/applications/create`
:doc:`/applications/delete`
:doc:`/applications/update`
- :doc:`/applications/create`
- :doc:`/applications/update`
- :doc:`/applications/delete`

Also consider the following methods in the :program:`mongo` JavaScript
shell that allow you to write or change data in a MongoDB database.
The following methods in the :program:`mongo` JavaScript shell allow you
to write or change data in a MongoDB database.

- :method:`db.collection.insert()`
- :method:`db.collection.update()`
Expand All @@ -29,43 +27,254 @@ shell that allow you to write or change data in a MongoDB database.
- :method:`db.collection.remove()`
- :method:`db.collection.delete()`

Consider the documentation for your client library or :doc:`driver
See the documentation for your client library or :doc:`driver
</applications/drivers>` for more information on how to access this
functionality from within your application.

Write Concern and Write Safety
------------------------------
.. index:: write concern
.. _write-operations-write-concern:

.. todo:: import and tweak section from the replica-set page. When we
publish this document we'll have to do a quick deletion/reduction
of the replica-set section, but during the editorial process the
content can be duplicated.
Write Concern
-------------

.. todo add note about all drivers after `date` will have w:1 write concern for all operations by default.

The :term:`write concern` option allows you to configure return
confirmation for some or all write operations.

By default, when a :term:`client` sends a write operation to a database
server, MongoDB returns the operation without waiting for the operation
to complete and therefore without confirming the success of the
operation.

To enable write concern and get return status on a write operation, use
the :dbcommand:`getLastError` command.

By default, the command confirms that the :program:`mongod` instance
received the write operation and has committed the write
operation to the in-memory representation of the database. This provides
a simple and low-latency level of write concern and will allow your
application to detect situations where the :program:`mongod` instance
becomes inaccessible or insertion errors caused by :ref:`duplicate key
errors <index-type-unique>`.

You can modify the level of write concern returned by the
:dbcommand:`getLastError` by issuing the command with one or both of
following options:

- ``j`` or "journal" option.

In addition to the default confirmation provided by
:dbcommand:`getLastError`, this option confirms that the
:program:`mongod` instance has written the data to the on-disk
journal. This ensures that the data is durable if :program:`mongod` or
the server itself crashes or shuts down unexpectedly.

- ``w`` option. This applies only to :term:`replica sets <replica set>`.

This option confirms that the write operation has replicated to a
specified number of replica set members. You specify a specific number
of servers or specify ``majority`` to ensure that the write propagates
to a majority of set members. The following ensures the operation has
replicated to two members:

.. code-block:: javascript

db.runCommand( { getLastError: 1, w: 2 } )

The default value of ``w`` is ``1``.

If you specify a ``w`` value greater than the number of available
non-:term:`arbiter` replica set members, the operation will block
until those members become available. This could cause the operation
to block forever. To specify a timeout threshold for the
:dbcommand:`getLastError` operation, use the ``wtimeout`` argument.

Many drivers have a write concern that automatically issues
:dbcommand:`getLastError` after write operations to ensure the
operations complete.

Write concern provides confirmation of write operations but can take
longer and are not required in all applications. Consider the following
operations:

.. code-block:: javascript

db.runCommand( { getLastError: 1, w: "majority" } )
db.getLastErrorObj("majority")

These equivalent :dbcommand:`getLastError` operations ensure that write
operations return only after a write operation has replicated to a
majority of the members of a replica set.

You can configure default :dbcommand:`getLastError` behavior for a
replica set. Use the :data:`settings.getLastErrorDefaults` setting in
the :doc:`replica set configuration </reference/replica-configuration>`.
For instance:

.. code-block:: javascript

cfg = rs.conf()
cfg.settings = {}
cfg.settings.getLastErrorDefaults = {w: "majority", j: true}
rs.reconfig(cfg)

When the new configuration is active, the :dbcommand:`getLastError`
operation waits for the write operation to complete on a majority of the
set members before returning. Specifying ``j: true`` makes
:dbcommand:`getLastError` wait for a complete commit of the operations
to the journal before returning.

The :data:`getLastErrorDefaults` setting only affects :dbcommand:`getLastError`
commands with *no* other arguments.

.. note::

Use of inappropriate write concern can lead to :ref:`rollbacks
<replica-set-rollbacks>` in the case of :ref:`replica set failover
<replica-set-failover>`. Always ensure that your operations have
specified the required write concern for your application.

For more information, see :ref:`replica-set-write-concern`.

.. index:: read preference
.. index:: slaveOk
.. _write-operations-bulk-insert:

Bulk Inserts
------------

:issue:`SERVER-2395`
A bulk insert allows MongoDB to distribute the write performance penalty
when performing inserts to a large number of documents at once. Bulk
inserts let you pass multiple events to the :method:`insert()` method at
once. All write concern options apply to bulk inserts.

If you insert data without write concern, the bulk insert gain might be
insignificant. But if you insert data with write concern configured,
bulk insert can bring significant performance gains by distributing the
penalty over the group of inserts.

Bulk inserts are often used with :term:`sharded collections <sharded
collection>` and are more effective when the collection is already
populated and MongoDB has already determined the key distribution.
Otherwise MongoDB needs time to learn and determine the distribution.

If the collection is not populated, you can avoid the learning time by
predefining key ranges, as described in
:ref:`sharding-administration-pre-splitting`.

When you perform bulk inserts, you can parallel import by sending
inserts to multiple :program:`mongos` instances.

To distribute data *during* bulk inserts or if the cluster becomes
uneven, see :ref:`Migrating Chunks
<sharding-balancing-manual-migration>`.

.. todo:: import the best content from: http://www.mongodb.org/display/DOCS/Bulk+Inserts sl
split between this section and the sharded clusters section.
If possible, consider using bulk inserts to insert event data.

For more information see :ref:`write-operations-sharded-clusters` and
:doc:`/administration/import-export`.

Indexing
--------

.. todo:: short section on the impact of indexes and index maintenance
on write operations.
After every insert, update, or delete operation, MongoDB updates not
only a collection but *every* index associated with the collection.
Therefore, every index on a collection adds some amount of
write-performance penalty.

In general, the performance gains that indexes realize for read
operations are worth the insertion penalty. But if your application is
write-heavy, be careful when creating new indexes.

For more information, see :doc:`/source/applications/indexes`.

Isolation
---------

- atomicity
- :doc:`/tutorial/perform-two-phase-commits`
All operations inside of a MongoDB document are atomic. An update
operation may modify more than one document at more than one level
(nesting) in a single operation that will either succeed or fail and
cannot leave the document in an in-between state.

For more information see :doc:`Isolated write operations
</reference/operator/atomic>` and
:doc:`/tutorial/perform-two-phase-commits`.

Architecture
------------

Replica Sets
~~~~~~~~~~~~

If you are performing a large data ingestion or bulk load operation that
requires a large number of writes to the primary, the secondaries will
not be able to read the oplog fast enough to keep up with changes.
Setting some level of write concern can slow the overall progress of the
batch but will prevent the secondary from falling too far behind.

To prevent this, use write concern so that MongoDB will perform a safe
write (i.e. call :dbcommand:`getLastError`) after every 100, 1,000, or
other designated number of operations. This provides an opportunity for
secondaries to catch up with the primary. Using safe writes, even in
batches, can impact write throughout; however, calling
:dbcommand:`getLastError` will prevents the secondaries from falling too
far behind the primary.

For more information see :ref:`replica-set-write-concern`,
:ref:`replica-set-oplog-sizing`, :ref:`replica-set-oplog`, and
:ref:`replica-set-procedure-change-oplog-size`.

.. _write-operations-sharded-clustsers:

Sharded Clusters
~~~~~~~~~~~~~~~~

In a :term:`sharded cluster`, a given write operation goes to a
particular :term:`shard` and :term:`chunk` in the cluster. Write
performance is affected by a number of factors, including the numbers of
writes and key ranges for the chunks.

If you insert many documents in rapid succession, MongoDB initially
directs writes to a single chunk, which can affect performance.

If your shard key monotonically increases and all inserts go the system
will adjust the metadata to keep balance, but at a given time ``t`` all
writes will go to a single shard, which is undesirable if insert rate is
extremely large. To avoid this, consider using a shard key that is not
increasing in value. For example in some cases you could reverse all the
bits of your shard key, which is information preserving yet then avoids
the increasing sequence of values.

Note that :term:`BSON` :term:`ObjectIds <ObjectId>` have this property. You might wish at
generation time to reverse the bits of the ObjectIds, or swap the first
and last 16 bit words, to "shuffle" the inserts. Alternatively you might
use UUIDs instead (but check that your UUID generator does not generate
increasing UUIDs consistently or you would get the same behavior).

Shard key values that are strictly increasing are fine if the insert
volume is within the range that a single shard can process at a given
point in time.

.. example:: The following example, in C++, swaps the leading and
trailing 16 bit word of object IDs generated so that they are no
longer monotonically increasing.

.. code-block:: none

using namespace mongo;
OID make_an_id() {
OID x = OID::gen();
const unsigned char *p = x.getData();
swap( (unsigned short&) p[0], (unsigned short&) p[10] );
return x;
}

void foo() {
// create an object
BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" );
// now we might insert o into a sharded collection...
}

For more information, see :doc:`/administration/sharding` and
:ref:`write-operations-bulk-insert`.
8 changes: 4 additions & 4 deletions source/applications/replication.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ This document describes those options and their implications.
.. _write-concern:
.. _replica-set-write-concern:

Write Concern
-------------
Write Concern for Replica Sets
------------------------------

When a :term:`client` sends a write operation to a database server, the
operation returns without waiting for the operation to succeed or
Expand Down Expand Up @@ -125,8 +125,8 @@ commands with *no* other arguments.
.. _replica-set-read-preference:
.. _slaveOk:

Read Preference
---------------
Read Preference for Replica Sets
--------------------------------

Read preference describes how MongoDB clients route read operations to
:term:`secondary` members of a :term:`replica set`.
Expand Down