diff --git a/source/release-notes/2.4.txt b/source/release-notes/2.4.txt index da0477d94af..20cd3a43e17 100644 --- a/source/release-notes/2.4.txt +++ b/source/release-notes/2.4.txt @@ -40,6 +40,396 @@ process. Changes ------- +Text Indexes +~~~~~~~~~~~~ + +.. note:: + + The ``text`` index type is currently an experimental feature and + you must enable it at run time. Interfaces and on-disk format may + change in future releases. + +Background +`````````` + +MongoDB2.3.2 includes a new ``text`` index type. ``text`` indexes +support boolean text search queries. Any set of fields containing +string data may be text indexed. You may only maintain a single +``text`` index per collection. ``text`` indexes are fully consistent +and updated in real-time as applications insert, update, or delete +documents from the database. The ``text`` index and query system +supports language specific stemming and stop-words. Additionally: + +- indexes and queries drop stop words (i.e. "the," "an," "a," "and," + etc.) + +- MongoDB stores words stemmed during insertion in the index, using + simple suffix stemming, including support for a number of + languages. MongoDB automatically stems :dbcommand:`text` queries at + before beginning the query. + +However, ``text`` indexes have large storage requirements and incur +**significant** performance costs: + +- Text indexes can be large. They contain one index entry for each + unique word indexed for each document inserted. + +- Building a ``text`` index is very similar to building a large + multi-key index, and therefore may take longer than building a + simple ordered (scalar)index. + +- ``text`` indexes will impede insertion throughput, because MongoDB + must add an index entry for each unique word in each indexed field + of each new source document. + +- some :dbcommand:`text` searches may affect performance on your + :program:`mongod`, particularly for negation queries and phrase + matches that cannot use the index as effectively as other kinds of + queries. + +Additionally, the current *experimental* implementation of ``text`` +indexes have the following limitations and behaviors: + +- ``text`` indexes do not store phrases or information about the + proximity of words in the documents. As a result, phrase queries + will run much more effectively when the entire collection fits in + RAM. + +- MongoDB does not stem phrases or negations in :dbcommand:`text` + queries. + +- the index is case insensitive. + +- a collection may only have a single ``text`` index at a time. + +.. important:: Do not enable or use ``text`` indexes on production + systems. + +.. May be worth including this: + + For production-grade search requirements consider using a + third-party search tool, and the `mongo-connector + `_ or a similar + integration strategy to provide more advanced search capabilities. + +Test ``text`` Indexes +````````````````````` + +.. important:: The ``text`` index type is an experimental feature and + you must enable the feature before creating or accessing a text + index. To enable text indexes, issue the following command at the + :program:`mongo` shell: + + .. code-block:: javascript + + db.adminCommand( { setParameter: 1, textSearchEnabled: true } ) + + You can also start the :program:`mongod` with the following + invocation: + + .. code-block:: sh + + mongod --setParameter textSearchEnabled=true + +Create Text Indexes +^^^^^^^^^^^^^^^^^^^ + +To create a ``text`` index, use the following invocation of +:method:`~db.collection.ensureIndex()`: + +.. code-block:: javascript + + db.collection.ensureIndex( { content: "text" } ) + +``text`` indexes catalog all string data in the ``content`` field. Your +``text`` index can include content from multiple fields, or arrays, +and from fields in sub-documents, as in the following: + +.. code-block:: javascript + + db.collection.ensureIndex( { content: "text", + "users.comments": "text", + "users.profiles": "text" } ) + +The default name for the index consists of the ```` +concatenated with ``_text``, as in the following: + +.. code-block:: javascript + + "content_text_users.comments_text_users.profiles_text" + +These indexes may run into the :limit:`Index Name Length` limit. To +avoid creating an index with a too-long name, you can specify a name +in the options parameter, as in the following: + +.. code-block:: javascript + + db.collection.ensureIndex( { content: "text", + "users.profiles": "text" }, + { name: "TextIndex" } ) + +When creating ``text`` indexes you may specify *weights* for specific +fields. *Weights* are factored into the relevant score for each +document. The score for a given word in a document is the weighted sum +of the frequency for each of the indexed fields in that document. +Consider the following: + +.. code-block:: javascript + + db.collection.ensureIndex( { content: "text", + "users.profiles": "text" }, + { name: "TextIndex", + weights: { content: 1, + "users.profiles": 2 } } ) + +This example creates a ``text`` index on the top-level field named +``content`` and the ``profiles`` field in the ``users`` +sub-documents. Furthermore, the ``content`` field has a weight of 1 and +the ``users.profiles`` field has a weight of 2. + +You can add a conventional ascending or descending index field(s) as a +prefix or suffix of the index so that queries can limit the number of +index entries the query must review to perform the query. You cannot +include :ref:`multi-key ` index field nor +:ref:`geospatial ` index field. + +If you create an ascending or descending index as a prefix of a +``text`` index: + +- MongoDB will only index documents that have the prefix field + (i.e. ``username``) and + +- All :dbcommand:`text` queries using this index must specify the + prefix field in the ``filter`` query. + +Create this index with the following operation: + +.. code-block:: javascript + + db.collection.ensureIndex( { username: 1, + "users.profiles": "text" } ) + +Alternatively you create an ascending or descending index as a suffix +to a ``text`` index. Then the ``text`` index can support +:ref:`covered queries ` if the +:dbcommand:`text` command specifies a ``projection`` option. + +Create this index with the following operation: + +.. code-block:: javascript + + db.collection.ensureIndex( { "users.profiles": "text", + username: 1 } ) + +Finally, you may use the special wild card field specifier (i.e. +``$**``) to specify index weights and fields. Consider the following +example that indexes any string value in the data of every field of +every document in a collection and names it ``TextIndex``: + +.. code-block:: javascript + + db.collection.ensureIndex( { "$**": "text", + username: 1 }, + { name: "TextIndex" } ) + +By default, an index field has a weight of ``1``. You may specify +weights for a ``text`` index with compound fields, as in the following: + +.. code-block:: javascript + + db.collection.ensureIndex( { content: "text", + "users.profiles": "text", + comments: "text", + keywords: "text", + about: "text" }, + { name: "TextIndex", + weights: + { content: 10, + "user.profiles": 2, + keywords: 5, + about: 5 } } ) + +This index, named ``TextIndex``, includes a number of fields, with the +following weights: + +- ``content`` field that has a weight of 10, +- ``users.profiles`` that has a weight of 2, +- ``comments`` that has a weight of 1, +- ``keywords`` that has a weight of 5, and +- ``about`` that has a weight of 5. + +This means that documents that match words in the ``content`` field +will appear in the result set more than all other fields in the index, +and that the ``user.profiles`` and ``comments`` fields will be less +likely to appear in responses than words from other fields. + +Text Queries +^^^^^^^^^^^^ + +MongoDB 2.3.2 introduces the :dbcommand:`text` command to provide +query support for ``text`` indexes. Unlike normal MongoDB queries, +:dbcommand:`text` returns a document rather than a +cursor. + +.. dbcommand:: text + + The :dbcommand:`text` provides an interface to search text context + stored in the ``text`` index. Consider the following prototype: + :dbcommand:`text`: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: , + filter: , + projection: , + limit: , + language: } ) + + The :dbcommand:`text` command has the following parameters: + + :param string search: + + A text string that MongoDB stems and uses to query the ``text`` + index. When specifying phrase matches, you must escape quote + characters as ``\"``. + + :param document filter: + + Optional. A :ref:`query document ` to + further limit the results of the query using another database + field. You can use any valid MongoDB query in the filter + document, except if the index includes an ascending or descending + index field as a prefix. + + If the index includes an ascending or descending index field, the + ``filter`` is required and the ``filter`` query must be an + equality match. + + :param document projection: + + Optional. Allows you to limit the fields returned by the query + to only those specified. + + :param number limit: + + Optional. Specify the maximum number of documents to include in + the response. + + :param string language: + + Optional. Specify the language that determines the tokenization, + stemming, and the stop words for the search. + + :return: + + :dbcommand:`text` returns results in the form of a + document. Results must fit within the :limit:`BSON Document + Size`. Use a projection setting to limit the size of the result + set. + + The implicit connector between the terms of a multi-term search is a + disjunction (``OR``). Search for ``"first second"`` searches + for ``"first"`` or ``"second"``. The scoring system will prefer + documents that contain all terms. + + However, consider the following behaviors of :dbcommand:`text` + queries: + + - With phrases (i.e. terms enclosed in escaped quotes), the search + performs an ``AND`` with any other terms in the search string; + e.g. search for ``"\"twinkle twinkle\" little star"`` searches for + ``"twinkle twinkle"`` and (``"little"`` or ``"star"``). + + - :dbcommand:`text` adds all negations to the query with the + logical ``AND`` operator. + +.. example:: + + Consider the following examples of :dbcommand:`text` queries. All + examples assume that you have a ``text`` index on the field named + ``content`` in a collection named ``collection``. + + #. Create a ``text`` index on the ``content`` field to enable text + search on the field: + + .. code-block:: javascript + + db.collection.ensureIndex( { content: "text" } ) + + #. Search for a single word ``search``: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: "search" } ) + + This query returns documents that contain the word + ``search``, case-insensitive, in the ``content`` field. + + #. Search for multiple words, ``create`` or ``search`` or ``fields``: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: "create search fields" } ) + + This query returns documents that contain the either ``create`` + **or** ``search`` **or** ``field`` in the ``content`` field. + + #. Search for the exact phrase ``create search fields``: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: "\"create search fields\"" } ) + + This query returns documents that contain the exact phrase + ``create search fields``. + + #. Search for documents that contain the words ``create`` or ``search``, + but **not** ``fields``: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: "create search -fields" } ) + + Use the ``-`` as a prefix to terms to specify negation in the + search string. The query returns documents that contain the + either ``creat`` **or** ``search``, but **not** ``field``, all + case-insensitive, in the ``content`` field. Prefixing a word + with a hyphen (``-``) negates a word: + + - The negated word filters out documents from the result set, + after selecting documents. + + - A ```` that only contains negative words returns no match. + + - A hyphenated word, such as ``case-insensitive``, is not a + negation. The :dbcommand:`text` command treats the hyphen and + as a delimiter. + + #. Search for a single word ``search`` with an additional ``filter`` on + the ``about`` field, but **limit** the results to 2 documents with the + highest score and return only the ``comments`` field in the matching + documents: + + .. code-block:: javascript + + db.collection.runCommand( "text", { + search: "insensitive", + filter: { about: /something/ }, + limit: 2, + projection: { comments: 1, _id: 0 } + } + ) + + - The ``filter`` :ref:`query document ` + is uses a :operator:`regular expression <$regex>`. See the + :ref:`query operators ` page for available query + operators. + + - The ``projection`` must explicitly exclude (``0``) the ``_id`` + field. Within the ``projection`` document, you cannot mix + inclusions (i.e. ``: 1``) and exclusions (i.e. ``: + 0``), except for the ``_id`` field. + Additional Authentication Features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -258,11 +648,6 @@ for ``s2d`` indexes as ``2d`` indexes. .. operator:: $intersect -.. note:: - - In 2.3.2, the :operator:`$intersect` operator will - become :operator:`$geoIntersects` - The :operator:`$intersect` selects all indexed points that intersect with the provided geometry. (i.e. ``Point``, ``LineString``, and ``Polygon``.) You must pass :operator:`$intersect` a document