Skip to content

Commit b7f8111

Browse files
author
Sam Kleinman
committed
DOCS-197 unique keys in shard collections pattern
1 parent 28ff4f4 commit b7f8111

File tree

1 file changed

+188
-0
lines changed

1 file changed

+188
-0
lines changed
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
===========================================
2+
Enforce Unique Keys for Sharded Collections
3+
===========================================
4+
5+
.. default-domain:: mongodb
6+
7+
Overview
8+
--------
9+
10+
The :dbcommand:`unique <ensureIndex>` constraint on indexes, allows
11+
you to ensure that no two documents can have the same value for a
12+
field, in a :term:`collection` of MongoDB :term:`documents
13+
<document>`. For :ref:`sharded collections these unique indexes cannot
14+
enforce uniqueness <limit-sharding-unique-indexes>` because insert and
15+
indexing operations are local to specific shards and do not have
16+
sufficient.
17+
18+
If you specify a unique index on a sharded collection, MongoDB will
19+
only be able to enforce uniqueness among the documents located on a
20+
single shard *at time of creation*. This document provides two
21+
approaches to enforcing uniqueness:
22+
23+
#. Enforce uniqueness of the shard key.
24+
25+
MongoDB *can* enforce value uniqueness the shard key. However, for
26+
multi-component keys, MongoDB will enforce uniqueness on the
27+
*entire* key combination, and not for a specific component of the
28+
shard key.
29+
30+
#. Use a secondary collection to enforce uniqueness.
31+
32+
Create a minimal collection that contains just the unique field and
33+
also references a document in the main collection. If you always
34+
insert to the secondary collection *before* inserting to the main
35+
collection, and if you attempt to use a duplicate key MongoDB will
36+
produce an error.
37+
38+
.. note::
39+
40+
If you have a small data set, you may not need to shard this
41+
collection and you can create multiple unique indexes; otherwise
42+
you can shard on a single unique key.
43+
44+
Unique Constraints on the Shard Key
45+
-----------------------------------
46+
47+
.. _sharding-pattern-unique-procedure-shard-key:
48+
49+
Process
50+
~~~~~~~
51+
52+
When sharding a collection using the :dbcommand:`shardcollection`
53+
command you can specify a ``unique`` constraint, in the following
54+
form:
55+
56+
.. code-block:: javascript
57+
58+
db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );
59+
60+
Remember that the index on the ``_id`` field is always unique. MongoDB
61+
inserts an ``ObjectId`` into the ``_id`` field, but you can manually
62+
create your own ``_id`` field and use this as the shard key. MongoDB
63+
will ensure that the "``_id``" field is unique in the sharded
64+
collection. Use the following operation to use the ``_id`` field as
65+
the shard key.
66+
67+
.. code-block:: javascript
68+
69+
db.runCommand( { shardcollection : "test.users" } )
70+
71+
.. note::
72+
73+
In any sharded collection where you are *not* sharding by the
74+
``_id`` field, it is incumbent upon you and your application to
75+
maintain uniqueness of the ``_id`` field. The best way to ensure
76+
unique ``_id`` fields is to use the ``ObjectId`` field, or some
77+
sort of other universally unique identifier (UUID.)
78+
79+
Limitations
80+
~~~~~~~~~~~
81+
82+
- You can only enforce uniqueness on one field in a collection using
83+
this method.
84+
85+
- If you use a compound shard key, you will only be able to enforce
86+
uniqueness of the *combination* of component keys in the shard
87+
key.
88+
89+
In most cases, the best shard keys compound keys that include elements
90+
that permit :ref:`write scaling <sharding-shard-key-write-scaling>`
91+
and :ref:`query isolation <sharding-shard-key-query-isolation>` as
92+
well as :ref:`high cardinality <sharding-shard-key-cardinality>`.
93+
These ideal shard keys are not often the same keys that require
94+
uniqueness and this requires a different approach.
95+
96+
Unique Constraints on Arbitrary Fields
97+
--------------------------------------
98+
99+
If you cannot use your unique field as the shard key, or you need to
100+
enforce uniqueness over more than one field, you must create another
101+
:term:`collection` that contains both a reference to the original
102+
document (i.e. its ``ObjectId``) and the unique key.
103+
104+
If you must shard this "proxy" collection, then shard on the unique
105+
key using the :ref:`above procedure
106+
<sharding-pattern-unique-procedure-shard-key>`, otherwise, you can
107+
simply create unique indexes on the collection.
108+
109+
Process
110+
~~~~~~~
111+
112+
Consider the following schema for the "proxy collection:"
113+
114+
.. code-block:: javascript
115+
116+
{
117+
"_id" : ObjectId("...")
118+
"email" ": "..."
119+
}
120+
121+
Here, the ``_id`` field holds the ``ObjectId`` of the
122+
:term:`document` it reflects, and the ``email`` field is the field on
123+
which you want to ensure uniqueness.
124+
125+
If you're going to shard this collection, use the following operation
126+
to shard the collection, using the ``email`` field as the :term:`shard
127+
key`:
128+
129+
.. code-block:: javascript
130+
131+
db.runCommand( { shardcollection : "records.proxy" , key : { email : 1 } , unique : true } );
132+
133+
If you do not need to shard the proxy collection, use the following
134+
command to create a unique index on the ``email`` field:
135+
136+
.. code-block:: javascript
137+
138+
db.proxy.ensureIndex( { "email" : 1 }, {unique : true} )
139+
140+
You may create multiple unique indexes on this collection if you do
141+
not plan to shard its contents.
142+
143+
Then, to insert documents, use the following procedure in the
144+
:ref:`JavaScript shell <mongo>`:
145+
146+
.. code-block:: javascript
147+
148+
use records
149+
150+
primary_id = ObjectId()
151+
152+
db.information.proxy({
153+
"_id" : primary_id
154+
"email" : "[email protected]"
155+
})
156+
157+
// if: the above operation returns successfully,
158+
// then continue:
159+
160+
db.information.insert({
161+
"_id" : primary_id
162+
"email": "[email protected]"
163+
// additional information...
164+
})
165+
166+
You must insert a document into the ``proxy`` collection first. If
167+
this operation succeeds, the ``email`` field is unique and you may
168+
continue by inserting the actual document into the ``information``
169+
collection.
170+
171+
.. see:: The full documentation of: :func:`db.collection.ensureIndex()`,
172+
:dbcommand:`ensureIndex`, and :dbcommand:`shardcollection`.
173+
174+
Considerations
175+
~~~~~~~~~~~~~~
176+
177+
- Your application must catch errors on inserting documents into the
178+
"proxy" collection and enforce consistency between the two
179+
collections.
180+
181+
- If the proxy collection requires sharding you must shard on the
182+
single field on which you want to enforce uniqueness.
183+
184+
- To enforce uniqueness on more than one field, you must have *one*
185+
proxy collection for *every* field that you want to enforce
186+
uniqueness or you can create multiple unique indexes on a single
187+
proxy collection but you will *not* be able to shard the proxy
188+
collection.

0 commit comments

Comments
 (0)