|
| 1 | +=========================================== |
| 2 | +Enforce Unique Keys for Sharded Collections |
| 3 | +=========================================== |
| 4 | + |
| 5 | +.. default-domain:: mongodb |
| 6 | + |
| 7 | +Overview |
| 8 | +-------- |
| 9 | + |
| 10 | +The :dbcommand:`unique <ensureIndex>` constraint on indexes, allows |
| 11 | +you to ensure that no two documents can have the same value for a |
| 12 | +field, in a :term:`collection` of MongoDB :term:`documents |
| 13 | +<document>`. For :ref:`sharded collections these unique indexes cannot |
| 14 | +enforce uniqueness <limit-sharding-unique-indexes>` because insert and |
| 15 | +indexing operations are local to specific shards and do not have |
| 16 | +sufficient. |
| 17 | + |
| 18 | +If you specify a unique index on a sharded collection, MongoDB will |
| 19 | +only be able to enforce uniqueness among the documents located on a |
| 20 | +single shard *at time of creation*. This document provides two |
| 21 | +approaches to enforcing uniqueness: |
| 22 | + |
| 23 | +#. Enforce uniqueness of the shard key. |
| 24 | + |
| 25 | + MongoDB *can* enforce value uniqueness the shard key. However, for |
| 26 | + multi-component keys, MongoDB will enforce uniqueness on the |
| 27 | + *entire* key combination, and not for a specific component of the |
| 28 | + shard key. |
| 29 | + |
| 30 | +#. Use a secondary collection to enforce uniqueness. |
| 31 | + |
| 32 | + Create a minimal collection that contains just the unique field and |
| 33 | + also references a document in the main collection. If you always |
| 34 | + insert to the secondary collection *before* inserting to the main |
| 35 | + collection, and if you attempt to use a duplicate key MongoDB will |
| 36 | + produce an error. |
| 37 | + |
| 38 | + .. note:: |
| 39 | + |
| 40 | + If you have a small data set, you may not need to shard this |
| 41 | + collection and you can create multiple unique indexes; otherwise |
| 42 | + you can shard on a single unique key. |
| 43 | + |
| 44 | +Unique Constraints on the Shard Key |
| 45 | +----------------------------------- |
| 46 | + |
| 47 | +.. _sharding-pattern-unique-procedure-shard-key: |
| 48 | + |
| 49 | +Process |
| 50 | +~~~~~~~ |
| 51 | + |
| 52 | +When sharding a collection using the :dbcommand:`shardcollection` |
| 53 | +command you can specify a ``unique`` constraint, in the following |
| 54 | +form: |
| 55 | + |
| 56 | +.. code-block:: javascript |
| 57 | + |
| 58 | + db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } ); |
| 59 | + |
| 60 | +Remember that the index on the ``_id`` field is always unique. MongoDB |
| 61 | +inserts an ``ObjectId`` into the ``_id`` field, but you can manually |
| 62 | +create your own ``_id`` field and use this as the shard key. MongoDB |
| 63 | +will ensure that the "``_id``" field is unique in the sharded |
| 64 | +collection. Use the following operation to use the ``_id`` field as |
| 65 | +the shard key. |
| 66 | + |
| 67 | +.. code-block:: javascript |
| 68 | + |
| 69 | + db.runCommand( { shardcollection : "test.users" } ) |
| 70 | + |
| 71 | +.. note:: |
| 72 | + |
| 73 | + In any sharded collection where you are *not* sharding by the |
| 74 | + ``_id`` field, it is incumbent upon you and your application to |
| 75 | + maintain uniqueness of the ``_id`` field. The best way to ensure |
| 76 | + unique ``_id`` fields is to use the ``ObjectId`` field, or some |
| 77 | + sort of other universally unique identifier (UUID.) |
| 78 | + |
| 79 | +Limitations |
| 80 | +~~~~~~~~~~~ |
| 81 | + |
| 82 | +- You can only enforce uniqueness on one field in a collection using |
| 83 | + this method. |
| 84 | + |
| 85 | +- If you use a compound shard key, you will only be able to enforce |
| 86 | + uniqueness of the *combination* of component keys in the shard |
| 87 | + key. |
| 88 | + |
| 89 | +In most cases, the best shard keys compound keys that include elements |
| 90 | +that permit :ref:`write scaling <sharding-shard-key-write-scaling>` |
| 91 | +and :ref:`query isolation <sharding-shard-key-query-isolation>` as |
| 92 | +well as :ref:`high cardinality <sharding-shard-key-cardinality>`. |
| 93 | +These ideal shard keys are not often the same keys that require |
| 94 | +uniqueness and this requires a different approach. |
| 95 | + |
| 96 | +Unique Constraints on Arbitrary Fields |
| 97 | +-------------------------------------- |
| 98 | + |
| 99 | +If you cannot use your unique field as the shard key, or you need to |
| 100 | +enforce uniqueness over more than one field, you must create another |
| 101 | +:term:`collection` that contains both a reference to the original |
| 102 | +document (i.e. its ``ObjectId``) and the unique key. |
| 103 | + |
| 104 | +If you must shard this "proxy" collection, then shard on the unique |
| 105 | +key using the :ref:`above procedure |
| 106 | +<sharding-pattern-unique-procedure-shard-key>`, otherwise, you can |
| 107 | +simply create unique indexes on the collection. |
| 108 | + |
| 109 | +Process |
| 110 | +~~~~~~~ |
| 111 | + |
| 112 | +Consider the following schema for the "proxy collection:" |
| 113 | + |
| 114 | +.. code-block:: javascript |
| 115 | + |
| 116 | + { |
| 117 | + "_id" : ObjectId("...") |
| 118 | + "email" ": "..." |
| 119 | + } |
| 120 | + |
| 121 | +Here, the ``_id`` field holds the ``ObjectId`` of the |
| 122 | +:term:`document` it reflects, and the ``email`` field is the field on |
| 123 | +which you want to ensure uniqueness. |
| 124 | + |
| 125 | +If you're going to shard this collection, use the following operation |
| 126 | +to shard the collection, using the ``email`` field as the :term:`shard |
| 127 | +key`: |
| 128 | + |
| 129 | +.. code-block:: javascript |
| 130 | + |
| 131 | + db.runCommand( { shardcollection : "records.proxy" , key : { email : 1 } , unique : true } ); |
| 132 | + |
| 133 | +If you do not need to shard the proxy collection, use the following |
| 134 | +command to create a unique index on the ``email`` field: |
| 135 | + |
| 136 | +.. code-block:: javascript |
| 137 | + |
| 138 | + db.proxy.ensureIndex( { "email" : 1 }, {unique : true} ) |
| 139 | + |
| 140 | +You may create multiple unique indexes on this collection if you do |
| 141 | +not plan to shard its contents. |
| 142 | + |
| 143 | +Then, to insert documents, use the following procedure in the |
| 144 | +:ref:`JavaScript shell <mongo>`: |
| 145 | + |
| 146 | +.. code-block:: javascript |
| 147 | + |
| 148 | + use records |
| 149 | + |
| 150 | + primary_id = ObjectId() |
| 151 | + |
| 152 | + db.information.proxy({ |
| 153 | + "_id" : primary_id |
| 154 | + |
| 155 | + }) |
| 156 | + |
| 157 | + // if: the above operation returns successfully, |
| 158 | + // then continue: |
| 159 | + |
| 160 | + db.information.insert({ |
| 161 | + "_id" : primary_id |
| 162 | + |
| 163 | + // additional information... |
| 164 | + }) |
| 165 | + |
| 166 | +You must insert a document into the ``proxy`` collection first. If |
| 167 | +this operation succeeds, the ``email`` field is unique and you may |
| 168 | +continue by inserting the actual document into the ``information`` |
| 169 | +collection. |
| 170 | + |
| 171 | +.. see:: The full documentation of: :func:`db.collection.ensureIndex()`, |
| 172 | + :dbcommand:`ensureIndex`, and :dbcommand:`shardcollection`. |
| 173 | + |
| 174 | +Considerations |
| 175 | +~~~~~~~~~~~~~~ |
| 176 | + |
| 177 | +- Your application must catch errors on inserting documents into the |
| 178 | + "proxy" collection and enforce consistency between the two |
| 179 | + collections. |
| 180 | + |
| 181 | +- If the proxy collection requires sharding you must shard on the |
| 182 | + single field on which you want to enforce uniqueness. |
| 183 | + |
| 184 | +- To enforce uniqueness on more than one field, you must have *one* |
| 185 | + proxy collection for *every* field that you want to enforce |
| 186 | + uniqueness or you can create multiple unique indexes on a single |
| 187 | + proxy collection but you will *not* be able to shard the proxy |
| 188 | + collection. |
0 commit comments