From 34054ee21891d502e81fc885850e7d9ad10dfdda Mon Sep 17 00:00:00 2001 From: kay Date: Tue, 26 Feb 2013 11:32:47 -0500 Subject: [PATCH] DOCS-1147 and DOCS-1206 text search DOCS-1206 ulimits Add note to Keyword Search Tutorial to link to Text Search DOCS-1147 text index sharded cluster and replica sets and reorg draft of the text-search usage fix information about hyphens and copy edits added more info DOCS-1147 text search --- source/applications.txt | 15 + source/applications/text-search.txt | 111 ++++ source/contents.txt | 1 + source/core/indexes.txt | 46 ++ source/core/text-index.txt | 186 ++++++ source/includes/fact-text-index-limit-one.rst | 1 + source/includes/fact-text-search-beta.rst | 10 + ...warning-text-search-not-for-production.rst | 10 + source/indexes.txt | 1 + source/reference.txt | 1 + source/reference/command/setParameter.txt | 1 + source/reference/command/text.txt | 211 +++++++ .../method/db.collection.ensureIndex.txt | 226 +++++--- source/reference/mongod.txt | 1 + source/reference/parameters.txt | 18 + source/reference/text-search.txt | 171 ++++++ source/reference/user-privileges.txt | 2 +- source/release-notes/2.4.txt | 545 +++--------------- source/tutorial.txt | 14 + ...ext-index-on-multi-language-collection.txt | 97 ++++ source/tutorial/enable-text-search.txt | 32 + ...umber-of-items-scanned-for-text-search.txt | 109 ++++ .../model-data-for-keyword-search.txt | 17 +- ...urn-text-queries-using-only-text-index.txt | 41 ++ source/tutorial/search-for-text.txt | 238 ++++++++ 25 files changed, 1555 insertions(+), 550 deletions(-) create mode 100644 source/applications/text-search.txt create mode 100644 source/core/text-index.txt create mode 100644 source/includes/fact-text-index-limit-one.rst create mode 100644 source/includes/fact-text-search-beta.rst create mode 100644 source/includes/warning-text-search-not-for-production.rst create mode 100644 source/reference/command/text.txt create mode 100644 source/reference/text-search.txt create mode 100644 source/tutorial/create-text-index-on-multi-language-collection.txt create mode 100644 source/tutorial/enable-text-search.txt create mode 100644 source/tutorial/limit-number-of-items-scanned-for-text-search.txt create mode 100644 source/tutorial/return-text-queries-using-only-text-index.txt create mode 100644 source/tutorial/search-for-text.txt diff --git a/source/applications.txt b/source/applications.txt index 91f1779dc51..b17e7c7c12d 100644 --- a/source/applications.txt +++ b/source/applications.txt @@ -57,3 +57,18 @@ The following documents provide patterns for developing application features: tutorial/isolate-sequence-of-operations tutorial/create-an-auto-incrementing-field tutorial/expire-data + +Text Search Patterns +-------------------- + +The following tutorials provide some patterns for +text search usage: + +.. toctree:: + :maxdepth: 1 + + tutorial/enable-text-search + tutorial/search-for-text + tutorial/create-text-index-on-multi-language-collection + tutorial/return-text-queries-using-only-text-index + tutorial/limit-number-of-items-scanned-for-text-search diff --git a/source/applications/text-search.txt b/source/applications/text-search.txt new file mode 100644 index 00000000000..730ca144470 --- /dev/null +++ b/source/applications/text-search.txt @@ -0,0 +1,111 @@ +=========== +Text Search +=========== + +.. default-domain:: mongodb + +.. versionadded:: 2.4 + +Overview +-------- + +Text search supports the search of string content in documents of a +collection. Text search introduces a new :ref:`text +` index type and a new :dbcommand:`text` command. + +The text search process: + +- tokenizes and stems the search term(s) during both the index creation + and the text command execution. + +- assigns a score to each document that contains the search term in the + indexed fields. The score determines the relevance of a document to a + given search query. + +By default, :dbcommand:`text` command returns at most the top 100 +matching documents as determined by the scores. + +.. _create-text-index: + +Create a ``text`` Index +----------------------- + +To perform text search, create a ``text`` index on the field or fields +whose value is a string or an array of string elements. To create a +``text`` indexes, use the :method:`db.collection.ensureIndex()` method +with a document that contains field and value pairs where the value is +the string literal ``text``. + +.. important:: + + - Before you can :ref:`create a text index ` or + :ref:`run the text command `, you need + to manually enable the text search. See + :doc:`/tutorial/enable-text-search` for information on how to + enable the text search feature. + + - Text indexes have significant storage requirements and performance + costs. See :ref:`text index feature ` for more + information. + + - .. include:: /includes/fact-text-index-limit-one.rst + +The following example creates a ``text`` index on the fields +``subject`` and ``content``: + +.. code-block:: javascript + + db.collection.ensureIndex( + { + subject: "text", + content: "text" + } + ) + +This ``text`` index catalogs all string data in the ``subject`` field +and the ``content`` field, where the field value is either a string or +an array of string elements. + +See :doc:`/core/text-index` for details on the options available when +creating ``text`` indexes. + +Additionally, ``text`` indexes can also be combined with +ascending/descending index fields. See: + +- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search` + +- :doc:`/tutorial/return-text-queries-using-only-text-index` + +.. _text-search-text-command: + +``text`` Command +---------------- + +The :dbcommand:`text` command can search for words and phrases. The +command matches on the complete stemmed words. For example, if a +document field contains the word ``blueberry``, a search on the term +``blue`` will not match the document. However, a search on either +``blueberry`` or ``blueberries`` will match. + +By default, the :dbcommand:`text` returns the top 100 scoring documents +in descending order, but you can specify a ``limit`` option to change +the maximum number to return. + +Given a collection with a ``text`` index, use the +:method:`~db.collection.runCommand()` method to execute the +:dbcommand:`text` command, as in: + +.. code-block:: javascript + + db.collection.runCommand( "text" , { search: } ) + +For information and examples on various text search patterns, see +:doc:`/tutorial/search-for-text`. + +Text Search Output +------------------ + +The :dbcommand:`text` command returns a document that contains the +result set. + +See :ref:`text-search-output` for information on the output. diff --git a/source/contents.txt b/source/contents.txt index e6644a3f7c2..d305e2b5b05 100644 --- a/source/contents.txt +++ b/source/contents.txt @@ -10,6 +10,7 @@ MongoDB Manual Contents security crud aggregation + applications/text-search indexes replication sharding diff --git a/source/core/indexes.txt b/source/core/indexes.txt index 1dcef62b77e..71eaa2ba73f 100644 --- a/source/core/indexes.txt +++ b/source/core/indexes.txt @@ -770,6 +770,52 @@ indexes are not suited for finding the closest documents to a particular location, when the closest documents are far away compared to bucket size. +.. index:: index; text +.. index:: text index +.. _index-feature-text: + +``text`` Indexes +~~~~~~~~~~~~~~~~ + +.. versionadded:: 2.4 + +MongoDB provides ``text`` indexes to support :doc:`text search +` on a collection. You can only access the +``text`` index with the :dbcommand:`text` command. + +``text`` indexes are case-insensitive and can include any field that +contains string data. ``text`` indexes drop language-specific stop +words (e.g. in English, “the,” “an,” “a,” “and,” etc.) and uses simple +language-specific suffix stemming. See :ref:`text-search-languages` for +the supported languages. + +``text`` indexes have the following storage requirements and +performance costs: + +- Text indexes can be large. They contain one index entry for each + unique post-stemmed word in each indexed field for each document + inserted. + +- Building a ``text`` index is very similar to building a large + multi-key index, and will take longer than building a simple ordered + (scalar) index on the same data. + +- When building a large ``text`` index on an existing collection, + ensure that you have a sufficiently-high open file descriptor limit. + See the :ref:`recommended settings `. + +- ``text`` indexes will impact insertion throughput because MongoDB + must add an index entry for each unique post-stemmed word in each + indexed field of each new source document. + +- Additionally, ``text`` indexes do not store phrases or information + about the proximity of words in the documents. As a result, phrase + queries will run much more effectively when the entire collection + fits in RAM. + +See :doc:`/applications/text-search` for more information on the text +search feature. + .. index:: index; limitations .. _index-limitations: diff --git a/source/core/text-index.txt b/source/core/text-index.txt new file mode 100644 index 00000000000..3a06ed7b962 --- /dev/null +++ b/source/core/text-index.txt @@ -0,0 +1,186 @@ +:orphan: + +============== +``text`` Index +============== + +.. default-domain:: mongodb + +This document provides details on some of the options available when +creating ``text`` indexes. + +Specify a Name for the ``text`` Index +------------------------------------- + +The default name for the index consists of each index field name +concatenated with ``_text``. Consider the ``text`` index on the fields +``content``, ``users.comments``, and ``users.profiles``. + +.. code-block:: javascript + + db.collection.ensureIndex( + { + content: "text", + "users.comments": "text", + "users.profiles": "text" + } + ) + +The default name for the index is: + +.. code-block:: javascript + + "content_text_users.comments_text_users.profiles_text" + +To avoid creating an index with a name that exceeds the :limit:`index +name length limit `, you can pass the ``name`` +option to the :method:`db.collection.ensureIndex()` method: + +.. code-block:: javascript + + db.collection.ensureIndex( + { + content: "text", + "users.comments": "text", + "users.profiles": "text" + }, + { + name: "MyTextIndex" + } + ) + +.. note:: + + To drop the ``text`` index, use the index name. To get the name of + an index, use :method:`db.collection.getIndexes()`. + +Index All Fields +---------------- + +To allow for text search on all fields with string content, use the +wildcard specifier (``$**``) to index all fields that contain string +content. + +The following example indexes any string value in the data of every +field of every document in a collection and names it ``TextIndex``: + +.. code-block:: javascript + + db.collection.ensureIndex( + { "$**": "text" }, + { name: "TextIndex" } + ) + +.. _text-index-default-language: + +Specify Languages for Text Index +-------------------------------- + +The default language associated with the indexed data determines the +list of stop words and the rules for the stemmer and tokenizer. The +default language for the indexed data is ``english``. + +To specify a different language, use the ``default_language`` option +when creating the ``text`` index. See :ref:`text-search-languages` for +the languages available for ``default_language``. + +The following example creates a ``text`` index on the +``content`` field and sets the ``default_language`` to +``spanish``: + +.. code-block:: javascript + + db.collection.ensureIndex( + { content : "text" }, + { default_language: "spanish" } + ) + +.. seealso:: + + :doc:`/tutorial/create-text-index-on-multi-language-collection` + +.. _text-index-internals-weights: + +Control Results of Text Search with Weights +------------------------------------------- + +By default, the :dbcommand:`text` command returns matching documents +based on scores, from highest to lowest. For a ``text`` index, the +*weight* of an indexed field denote the significance of the field +relative to the other indexed fields in terms of the score. The score +calculation for a given word in a document includes the weighted sum of +the frequency for each of the indexed fields in that document. + +The default weight is 1 for the indexed fields. To adjust the weights +for the indexed fields, include the ``weights`` option in the +:method:`db.collection.ensureIndex()` method. + +.. warning:: + + Choose the weights carefully in order to prevent the need to reindex. + +A collection ``blog`` has the following documents: + +.. code-block:: javascript + + { _id: 1, + content: "This morning I had a cup of coffee.", + about: "beverage", + keywords: [ "coffee" ] + } + + { _id: 2, + content: "Who doesn't like cake?", + about: "food", + keywords: [ "cake", "food", "dessert" ] + } + +To create a ``text`` index with different field weights for the +``content`` field and the ``keywords`` field, include the ``weights`` +option to the :method:`~db.collection.ensureIndex()` method. + +.. code-block:: javascript + + db.blog.ensureIndex( + { + content: "text", + keywords: "text", + about: "text" + }, + { + weights: { + content: 10, + keywords: 5, + }, + name: "TextIndex" + } + ) + +The ``text`` index has the following fields and weights: + +- ``content`` has a weight of 10, + +- ``keywords`` has a weight of 5, and + +- ``about`` has the default weight of 1. + +These weights denote the relative significance of the indexed fields to +each other. For instance, a term match in the ``content`` field has: + +- ``2`` times (i.e. ``10:5``) the impact as a term match in the + ``keywords`` field and + +- ``10`` times (i.e. ``10:1``) the impact as a term match in the + ``about`` field. + +Tutorials +--------- + +The following tutorials offer additional ``text`` index creation +patterns: + +- :doc:`/tutorial/create-text-index-on-multi-language-collection` + +- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search` + +- :doc:`/tutorial/return-text-queries-using-only-text-index` diff --git a/source/includes/fact-text-index-limit-one.rst b/source/includes/fact-text-index-limit-one.rst new file mode 100644 index 00000000000..f31ee3d2178 --- /dev/null +++ b/source/includes/fact-text-index-limit-one.rst @@ -0,0 +1 @@ +A collection can have at most only **one** ``text`` index. \ No newline at end of file diff --git a/source/includes/fact-text-search-beta.rst b/source/includes/fact-text-search-beta.rst new file mode 100644 index 00000000000..e976d454f6b --- /dev/null +++ b/source/includes/fact-text-search-beta.rst @@ -0,0 +1,10 @@ +The :doc:`text search ` is currently a +*beta* feature. As a beta feature: + +- You need to explicitly enable the feature before :ref:`creating a text + index ` or using the :dbcommand:`text` command. + +- To enable text search on :doc:`replica sets ` and + :doc:`sharded clusters `, you need to + enable on **each and every** :program:`mongod` for replica + sets and on **each and every** :program:`mongos` for sharded clusters. diff --git a/source/includes/warning-text-search-not-for-production.rst b/source/includes/warning-text-search-not-for-production.rst new file mode 100644 index 00000000000..f2d17497bd9 --- /dev/null +++ b/source/includes/warning-text-search-not-for-production.rst @@ -0,0 +1,10 @@ +.. warning:: + + .. not-for-production + + - Do **not** enable or use text search on production systems. + + .. significant-storage-requirements + + - Text indexes have significant storage requirements and performance + costs. See :ref:`index-feature-text` for more information. diff --git a/source/indexes.txt b/source/indexes.txt index 121f7d40208..92b48d25fb0 100644 --- a/source/indexes.txt +++ b/source/indexes.txt @@ -23,3 +23,4 @@ The following outlines the indexing documentation: applications/indexes applications/geospatial-indexes core/geospatial-indexes + applications/text-search diff --git a/source/reference.txt b/source/reference.txt index b845a717fbb..8f21f9584ab 100644 --- a/source/reference.txt +++ b/source/reference.txt @@ -107,6 +107,7 @@ General Reference reference/limits reference/mongodb-extended-json + reference/text-search reference/glossary .. seealso:: The :ref:`genindex` may provide useful insight into the diff --git a/source/reference/command/setParameter.txt b/source/reference/command/setParameter.txt index c43d38f4141..3bcbf1ed410 100644 --- a/source/reference/command/setParameter.txt +++ b/source/reference/command/setParameter.txt @@ -27,5 +27,6 @@ setParameter - :parameter:`replIndexPrefetch` - :parameter:`syncdelay` - :parameter:`traceExceptions` + - :parameter:`textSearchEnabled` .. slave-ok, admin-only diff --git a/source/reference/command/text.txt b/source/reference/command/text.txt new file mode 100644 index 00000000000..70c3a6d4b6a --- /dev/null +++ b/source/reference/command/text.txt @@ -0,0 +1,211 @@ +==== +text +==== + +.. default-domain:: mongodb + +.. dbcommand:: text + + .. versionadded:: 2.4 + + The :dbcommand:`text` command provides an interface to search text + context stored in the :ref:`text index `. By + default, the command limits the matches to the top 100 scoring + documents, in descending score order, but you can specify a + different limit. The :dbcommand:`text` command is + **case-insensitive**. + + The :dbcommand:`text` has the following syntax: + + .. code-block:: javascript + + db.collection.runCommand( "text", { search: , + filter: , + project: , + limit: , + language: } ) + + The :dbcommand:`text` command has the following parameters: + + :param string search: + + A string of terms that MongoDB parses and uses to query the + ``text`` index. The :dbcommand:`text` command returns all + documents that contain any of the terms; i.e. it performs a + logical ``OR`` search. + + Enclose the string of terms in escaped double quotes to match + on the phrase. + + Additionally, the :dbcommand:`text` command treats most + punctuation as delimiters, except when a hyphen '-' is used + to negate terms. + + Prefixing a word with a minus sign (-) negates a word: + + - The negated word excludes documents that contain the + negated word from the result set. + + - A search string that only contains negative words returns + **no** match. + + - A hyphenated word, such as ``pre-market``, is not a + negation. The text command treats the hyphen as a delimiter. + + :param document filter: + + Optional. A :ref:`query document ` to + further limit the results of the query using another database + field. You can use any valid MongoDB query in the filter + document, except if the index includes an ascending or + descending index field as a prefix. + + If the index includes an ascending or descending index field + as a prefix, the ``filter`` is required and the ``filter`` + query must be an equality match. + + :param document project: + + Optional. Allows you to limit the fields returned by the + query to only those specified. + + By default, the ``_id`` field returns as part of the result + set *unless* you explicitly exclude the field in the project + document. + + :param number limit: + + Optional. Specify the maximum number of documents to include + in the response. The :dbcommand:`text` sorts the results + before applying the ``limit``. + + The default limit is 100. + + :param string language: + + Optional. Specify the language that determines for the search + the list of stop words and the rules for the stemmer and + tokenizer. If not specified, the search uses default language + of the index. See :ref:`text-search-languages` for the + supported languages. Specify the language in **lowercase**. + + :return: + + A document that contains a field ``results`` that contains + an array of the highest scoring documents, in descending + order by score. See :ref:`text-search-output` for details. + + The returned document must fit within the :limit:`BSON + Document Size`. Use the ``limit`` and the ``project`` + parameters to limit the size of the result set. + + .. note:: + + - If the ``search`` string includes phrases, the search performs + an ``AND`` with any other terms in the search string; e.g. + search for ``"\"twinkle twinkle\" little star"`` searches for + ``"twinkle twinkle"`` **and** (``"little"`` **or** ``"star"``). + + - :dbcommand:`text` adds all negations to the query with the + logical ``AND`` operator. + + - The :dbcommand:`text` command ignores stop words for the search + language language, such as ``the`` and ``and`` in English. + + - The :dbcommand:`text` command matches on the complete *stemmed* + words. So if a document field contains the word ``blueberry``, + a search on the term ``blue`` will not match. However, + ``blueberry`` or ``blueberries`` will match. + + For the following examples, assume a collection ``articles`` has a text + index on the field ``subject``: + + .. code-block:: javascript + + db.articles.ensureIndex( { subject: "text" } ) + + .. example:: Search for a Single Word + + .. code-block:: javascript + + db.articles.runCommand( "text", { search: "coffee" } ) + + This query returns documents that contain the word ``coffee``, + case-insensitive, in the indexed ``subject`` field. + + .. example:: Search for Multiple Words + + The following command searches for ``bake`` or ``coffee`` or ``cake``: + + .. code-block:: javascript + + db.articles.runCommand( "text", { search: "bake coffee cake" } ) + + This query returns documents that contain either ``bake`` **or** + ``coffee`` **or** ``cake`` in the indexed ``subject`` field. + + .. example:: Search for a Phrase + + .. code-block:: javascript + + db.articles.runCommand( "text", { search: "\"bake coffee cake\"" } ) + + This query returns documents that contain the phrase ``bake + coffee cake``. + + .. example:: Exclude a Term from the Result Set + + Use the hyphen (``-``) as a prefix to exclude documents that + contain a term. Search for documents that contain the words + ``bake`` or ``coffee``, but does **not** contain ``cake``: + + .. code-block:: javascript + + db.articles.runCommand( "text", { search: "bake coffee -cake" } ) + + .. example:: Search with Additional Query Conditions + + Use the ``filter`` option to include additional query conditions. + + Search for a single word ``coffee`` with an additional filter on + the ``about`` field, but limit the results to 2 documents + with the highest score and return only the ``subject`` field in + the matching documents: + + .. code-block:: javascript + + db.articles.runCommand( "text", { + search: "coffee", + filter: { about: /desserts/ }, + limit: 2, + project: { subject: 1, _id: 0 } + } + ) + + - The ``filter`` :ref:`query document ` + may use any of the available :doc:`query operators + `. + + - Because the ``_id`` field is implicitly included, in order to + return **only** the ``subject`` field, you must explicitly + exclude (``0``) the ``_id`` field. Within the ``project`` + document, you cannot mix inclusions (i.e. ``: 1``) and + exclusions (i.e. ``: 0``), except for the ``_id`` field. + + .. example:: Search a Different Language + + Use the ``language`` option to specify Spanish as the language + that determines the list of stop words and the rules for the + stemmer and tokenizer: + + .. code-block:: javascript + + db.articles.runCommand( "text", { + search: "leche", + language: "spanish" + } + ) + + See :ref:`text-search-languages` for the supported languages. + + .. important:: Specify the language in **lowercase**. diff --git a/source/reference/method/db.collection.ensureIndex.txt b/source/reference/method/db.collection.ensureIndex.txt index 68355e93549..ab5b1784934 100644 --- a/source/reference/method/db.collection.ensureIndex.txt +++ b/source/reference/method/db.collection.ensureIndex.txt @@ -9,50 +9,61 @@ db.collection.ensureIndex() Creates an index on the field specified, if that index does not already exist. - :param document keys: A :term:`document` that contains - pairs with the name of the field or - fields to index and order of the index. A + :param document keys: For ascending/descending indexes, a + :ref:`document ` + that contains pairs with the name of the field + or fields to index and order of the index. A ``1`` specifies ascending and a ``-1`` specifies descending. + For ``text`` indexes, see + :ref:`create-text-index`. + :param document options: A :term:`document` that controls the creation of the index. This argument is optional. .. warning:: Index names, including their full namespace (i.e. ``database.collection``) can be no longer than 128 characters. See the :method:`db.collection.getIndexes()` field - ":data:`~system.indexes.name`" for the names of existing indexes. + :data:`~system.indexes.name` for the names of existing indexes. - Consider the following prototype: + .. example:: Create an Ascending Index on a Single Field - .. code-block:: javascript + The following example creates an ascending index on the field + ``orderDate``. - db.collection.ensureIndex({ : 1}) + .. code-block:: javascript - This command creates an index, in ascending order, on the field - ``[key]``. + db.collection.ensureIndex( { orderDate: 1 } ) - If the ``keys`` document specifies more than one field, than + If the ``keys`` document specifies more than one field, then :method:`db.collection.ensureIndex()` creates a :term:`compound - index`. To specify a compound index use the following form: + index`. + + .. example:: Create an Index on a Multiple Fields - .. code-block:: javascript + The following example creates a compound index on the + ``orderDate`` field (in ascending order) and the ``zipcode`` + field (in descending order.) - db.collection.ensureIndex({ : 1, : -1 }) + .. code-block:: javascript - This command creates a compound index on the ``key`` field - (in ascending order) and ``key1`` field (in descending order.) + db.collection.ensureIndex( { orderDate: 1, zipcode: -1 } ) .. note:: The order of an index is important for supporting :method:`cursor.sort()` operations using the index. - The :doc:`/indexes` section of this manual for full documentation - of indexes and indexing in MongoDB. + .. seealso:: + + - The :doc:`/indexes` section of this manual for full + documentation of indexes and indexing in MongoDB. - :method:`~db.collection.ensureIndex()` provides the following - options + - The :ref:`create-text-index` section for more information and + examples on creating ``text`` indexes. + + :method:`~db.collection.ensureIndex()` provides the following options: .. list-table:: :header-rows: 1 @@ -60,69 +71,139 @@ db.collection.ensureIndex() * - **Option** - **Value** - **Default** + - **Index Type** - * - background + * - :option:`background` - ``true`` or ``false`` - ``false`` - * - unique + - All + + * - :option:`unique` - ``true`` or ``false`` - ``false`` - * - name + - Ascending/Descending + + * - :option:`name` - string - none - * - dropDups + - All + + * - :option:`dropDups` - ``true`` or ``false`` - ``false`` - * - sparse + - Scalar + + * - :option:`sparse` - ``true`` or ``false`` - ``false`` - * - expireAfterSeconds + - Ascending/Descending + + * - :option:`expireAfterSeconds` - integer - none - * - v + - :term:`TTL` + + * - :option:`v` - index version - 1 + - All + + * - :option:`weights` + - document + - 1 + - Text + + * - :option:`default_language` + - string + - ``english`` + - Text + + * - :option:`language_override` + - string + - "language" + - Text + + :option boolean background: + + Specify ``true`` to build the index in the + background so that building an index will *not* + block other database activities. + + :option boolean unique: + + Specify ``true`` to create a unique index so that + the collection will not accept insertion of + documents where the index key or keys matches an + existing value in the index. + + :option string name: + + Specify the name of the index. If unspecified, + MongoDB will generate an index name by concatenating + the names of the indexed fields and the sort order. - :option boolean background: Specify ``true`` to build the index - in the background so that building an - index will *not* block other database - activities. - - :option boolean unique: Specify ``true`` to create a unique index - so that the collection will not accept - insertion of documents where the index key - or keys matches an existing value in the - index. - - :option string name: Specify the name of the index. If unspecified, - MongoDB will generate an index name by concatenating - the names of the indexed fields and the sort order. - - :option boolean dropDups: Specify ``true`` when creating a unique - index, on a field that *may* have - duplicate to index only the first - occurrence of a key, and **remove** all - documents from the collection that - contain subsequent occurrences of that - key. - - :option boolean sparse: If ``true``, the index only references - documents with the specified field. These - indexes use less space, but behave - differently in some situations - (particularly sorts.) - - :option integer expireAfterSeconds: Specify a value, in seconds, as - a :term:`TTL` to control how - long MongoDB will retain - documents in this collection. - See ":doc:`/tutorial/expire-data`" - for more information on this - functionality. - - :option v: Only specify a different index version in unusual - situations. The latest index version (version 1) provides a smaller - and faster index format. + :option boolean dropDups: + + Specify ``true`` when creating a unique index, on a + field that *may* have duplicate to index only the + first occurrence of a key, and **remove** all + documents from the collection that contain + subsequent occurrences of that key. + + :option boolean sparse: + + If ``true``, the index only references documents + with the specified field. These indexes use less + space, but behave differently in some situations + (particularly sorts.) + + :option integer expireAfterSeconds: + + Specify a value, in seconds, as a :term:`TTL` to + control how long MongoDB will retain documents in + this collection. See :doc:`/tutorial/expire-data` + for more information on this functionality. + + :option v: + + Only specify a different index version in unusual + situations. The latest index version (version 1) provides a + smaller and faster index format. + + :option document weights: + + For ``text`` index only. The document contains + field and weight pairs. The weight is a number + ranging from 1 to 99,999. + + The *weight* of the index field denote the + significance of the field relative to the other + indexed fields in terms of the score. You can + specify weights for some or all the indexed fields. + See :ref:`text-index-internals-weights` to adjust + the scores. + + :option string default_language: + + For ``text`` index only. Specify the language that + determines the list of stop words and the rules for + the stemmer and tokenizer. The default language for + the indexed data is ``english``. + + See :ref:`text-search-languages` for the available + languages and :ref:`text-index-default-language` for + more information and example. + + :option string language_override: + + For ``text`` index only. + + Specify the name of the field in the document that + contains, for that document, the language to override + the default language. + + See + :doc:`/tutorial/create-text-index-on-multi-language-collection`. Please be aware of the following behaviors of :method:`ensureIndex() `: @@ -150,3 +231,16 @@ db.collection.ensureIndex() .. [#] The default index version depends on the version of :program:`mongod` running when creating the index. Before version 2.0, the this value was 0; versions 2.0 and later use version 1. + + .. seealso:: + + In addition to the ascending/descending indexes, MongoDB provides + the following index types to provide additional functionalities: + + - :ref:`index-feature-ttl` to support expiration of data, + + - :ref:`index-feature-geospatial` and + :ref:`index-geohaystack-index` to support geospatial queries, + and + + - :ref:`index-feature-text` to support text searches. diff --git a/source/reference/mongod.txt b/source/reference/mongod.txt index 16bed5c7579..739864b74aa 100644 --- a/source/reference/mongod.txt +++ b/source/reference/mongod.txt @@ -411,6 +411,7 @@ Options - :parameter:`supportCompatibilityFormPrivilegeDocuments` - :parameter:`syncdelay` - :parameter:`traceExceptions` + - :parameter:`textSearchEnabled` .. option:: --slowms diff --git a/source/reference/parameters.txt b/source/reference/parameters.txt index 8be4b6c9490..84b72b8c217 100644 --- a/source/reference/parameters.txt +++ b/source/reference/parameters.txt @@ -205,3 +205,21 @@ Parameters db.runCommand( { setParameter: 1, quiet: true } ) .. seealso:: :setting:`quiet` + +.. parameter:: textSearchEnabled + + .. versionadded:: 2.4 + + .. include:: /includes/warning-text-search-not-for-production.rst + + Enables the :doc:`text search ` feature. + You must enable the feature before creating or accessing a text + index. + + .. code-block:: sh + + mongod --setParameter textSearchEnabled=true + + If the flag is not enabled, you cannot create *new* ``text`` + indexes, and you cannot perform text searches. However, existing + ``text`` indexes will still be updated. diff --git a/source/reference/text-search.txt b/source/reference/text-search.txt new file mode 100644 index 00000000000..5df79f2b805 --- /dev/null +++ b/source/reference/text-search.txt @@ -0,0 +1,171 @@ +===================== +Text Search Reference +===================== + +.. default-domain:: mongodb + +.. _text-search-output: + +Text Search Output +------------------ + +The :dbcommand:`text` command returns a document, as in the following +example: + +.. warning:: + + The complete results of the :dbcommand:`text` command must fit + within the :limit:`BSON Document Size`. Use the ``limit`` and the + ``project`` parameters with the :dbcommand:`text` command to limit + the size of the result set. + +.. code-block:: javascript + + { + "queryDebugString" : "tomorrow||||||", + "language" : "english", + "results" : [ + { + "score" : 1.3125, + "obj": { + "_id" : ObjectId("50ecef5f8abea0fda30ceab3"), + "quote" : "tomorrow, and tomorrow, and tomorrow, creeps in this petty pace", + "related_quotes" : [ + "is this a dagger which I see before me", + "the handle toward my hand?" + ], + "src" : { + "title" : "Macbeth", + "from" : "Act V, Scene V" + }, + "speaker" : "macbeth" + } + } + ], + "stats" : { + "nscanned" : 1, + "nscannedObjects" : 0, + "n" : 1, + "nfound" : 1, + "timeMicros" : 163 + }, + "ok" : 1 + } + +The returned document contains the following fields: + +.. data:: text.queryDebugString + + For internal use only. + +.. data:: text.language + + The :data:`~text.language` field returns the language used for the + text search. This language determines the list of stop words and the + rules for the stemmer and tokenizer. + +.. data:: text.results + + The :data:`~text.results` field returns an array of result documents + that contain the information on the matching documents. The result + documents are ordered by the :data:`~text.results.score`. Each + result document contains: + + .. data:: text.results.obj + + The :data:`~text.results.obj` field returns the actual document + from the collection that contained the stemmed term or terms. + + .. data:: text.results.score + + The :data:`~text.results.score` field for the document that + contained the stemmed term or terms. The + :data:`~text.results.score` field signifies how well the document + matched the stemmed term or terms. See + :ref:`text-index-internals-weights` for how you can + adjust the scores for the matching words. + +.. data:: text.stats + + The :data:`~text.stats` field returns a document that contains the + query execution statistics. The :data:`~text.stats` field contains: + + .. data:: text.stats.nscanned + + The :data:`~data::text.stats.nscanned` field returns the total + number of index entries scanned. + + .. data:: text.stats.nscannedObjects + + The :data:`~text.stats.nscannedObjects` field returns the + total number of documents scanned. + + .. data:: text.stats.n + + The :data:`~text.stats.n` field returns the number of elements in + the :data:`~text.results` array. This number may be less than the + total number of matching documents, i.e. + :data:`~text.stats.nfound`, if the full result exceeds the + :limit:`BSON Document Size`. + + .. data:: text.stats.nfound + + The :data:`~text.stats.nfound` field returns the total number + number of documents that match. This number may be greater than + the size of the :data:`~text.results` array, i.e. + :data:`~text.stats.n`, if the result set exceeds the :limit:`BSON + Document Size`. + + .. data:: text.stats.timeMicros + + The :data:`~text.stats.timeMicros` field returns the time in + microseconds for the search. + +.. data:: text.ok + + The :data:`~text.ok` returns the status of the :dbcommand:`text` + command. + +.. _text-search-languages: + +Text Search Languages +--------------------- + +The :ref:`text index ` and the :dbcommand:`text` +command support the following languages: + +- ``danish`` + +- ``dutch`` + +- ``english`` + +- ``finnish`` + +- ``french`` + +- ``german`` + +- ``hungarian`` + +- ``italian`` + +- ``norwegian`` + +- ``portuguese`` + +- ``romanian`` + +- ``russian`` + +- ``spanish`` + +- ``swedish`` + +- ``turkish`` + +.. note:: + + If you specify a language value of ``"none"``, then the text search + has no list of stop words, and the text search does not stem or + tokenize the search terms. diff --git a/source/reference/user-privileges.txt b/source/reference/user-privileges.txt index 13468bbb5d1..4609447e537 100644 --- a/source/reference/user-privileges.txt +++ b/source/reference/user-privileges.txt @@ -68,7 +68,7 @@ Database User Roles - :dbcommand:`geoWalk` - :dbcommand:`group` - :dbcommand:`mapReduce` (inline output only.) - - :dbcommand:`text` (experimental feature.) + - :dbcommand:`text` (beta feature.) .. authrole:: readWrite diff --git a/source/release-notes/2.4.txt b/source/release-notes/2.4.txt index a47f6f08f1b..9b84f7e5173 100644 --- a/source/release-notes/2.4.txt +++ b/source/release-notes/2.4.txt @@ -45,530 +45,123 @@ See :doc:`/release-notes/2.4-upgrade` for full upgrade instructions. Changes ------- -Text Indexes -~~~~~~~~~~~~ +Text Search +~~~~~~~~~~~ -.. note:: +Text search supports the search of string content in documents of a +collection. Text search introduces a :ref:`text index +` type and a new :dbcommand:`text` command. See +:doc:`/applications/text-search` for more information the text search +feature. - The ``text`` index type is currently an experimental feature. - To use a ``text`` index, you need to enable it at run time or - startup. +Enable Text Search +`````````````````` -Background -`````````` +.. include:: /includes/warning-text-search-not-for-production.rst + :end-before: significant-storage-requirements -MongoDB 2.3.2 includes a new ``text`` index type. ``text`` indexes -support boolean text search queries: +To perform text search, you need to enable the text search feature. -- Any set of fields containing string data may be text indexed. +.. include:: /includes/fact-text-search-beta.rst + +You can enable the ``text`` search feature at startup. See +:doc:`/tutorial/enable-text-search` for details. -- You may only maintain a **single** ``text`` index per collection. +Index Behavior +`````````````` + +To support text search, MongoDB 2.4 introduces a new ``text`` index +type: + +- You can only access the index with the :dbcommand:`text` command. + +- Any set of fields containing string data may be text indexed. - ``text`` indexes are fully consistent and updated in real-time as applications insert, update, or delete documents from the database. -- The ``text`` index and query system supports language specific - stemming and stop words. Additionally: - - - Indexes and queries drop stop words (i.e. "the," "an," "a," "and," - etc.) +- ``text`` indexes drop language-specific stop words (e.g. in English, + “the,” “an,” “a,” “and,” etc.) and uses simple language-specific + suffix stemming. See :ref:`text-search-languages` for the supported + languages.: - - MongoDB stores words stemmed during insertion, using simple suffix - stemming, and includes support for a number of languages. MongoDB - automatically stems :dbcommand:`text` queries before beginning the - query. +- MongoDB stores words stemmed during insertion, using simple suffix + stemming. MongoDB automatically stems :dbcommand:`text` queries + before beginning the query. -However, ``text`` indexes have large storage requirements and incur -**significant** performance costs: +``text`` indexes have the following storage requirements and +performance impacts: - Text indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted. - Building a ``text`` index is very similar to building a large - multi-key index, and therefore may take longer than building a - simple ordered (scalar) index. + multi-key index, and therefore may take longer than building a simple + ordered (scalar) index. -- ``text`` indexes will impede insertion throughput, because MongoDB +- ``text`` indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document. +- Additionally, ``text`` indexes do not store phrases or information + about the proximity of words in the documents. As a result, phrase + queries will run much more effectively when the entire collection + fits in RAM. + +Additionally, the current **beta** implementation of ``text`` +indexes have the following limitations and behaviors: + - Some :dbcommand:`text` searches may affect performance on your :program:`mongod`, particularly for negation queries and phrase matches that cannot use the index as effectively as other kinds of queries. -Additionally, the current *experimental* implementation of ``text`` -indexes have the following limitations and behaviors: - -- ``text`` indexes do not store phrases or information about the - proximity of words in the documents. As a result, phrase queries - will run much more effectively when the entire collection fits in - RAM. - - MongoDB does not stem phrases or negations in :dbcommand:`text` queries. - The index is case-insensitive. -- A collection may only have a single ``text`` index at a time. - -.. warning:: - - Do **not** enable or use ``text`` indexes on production systems. - -.. May be worth including this: - - For production-grade search requirements consider using a - third-party search tool, and the `mongo-connector - `_ or a similar - integration strategy to provide more advanced search capabilities. - -Test ``text`` Indexes -````````````````````` - -The ``text`` index type is an experimental feature and you need to -enable the feature before creating or accessing a text index. +- .. include:: /includes/fact-text-index-limit-one.rst -To enable text indexes, issue the following command in the -:program:`mongo` shell: +- API is subject to change in subsequent releases. -.. warning:: +Create Text Index +`````````````````` - Do **not** enable or use ``text`` indexes on production systems. +The following example creates a ``text`` index on the field ``content`` +in the collection ``articles``: .. code-block:: javascript - db.adminCommand( { setParameter: 1, textSearchEnabled: true } ) - -You can also start the :program:`mongod` with the following -invocation: - -.. code-block:: sh - - mongod --setParameter textSearchEnabled=true - -Create Text Indexes -^^^^^^^^^^^^^^^^^^^ - -To create a ``text`` index, use the following syntax of -:method:`~db.collection.ensureIndex()`: - -.. code-block:: javascript - - db.collection.ensureIndex( { : "text" } ) - -Consider the following example: + db.articles.ensureIndex( { content: "text" } ) -.. code-block:: javascript - - db.collection.ensureIndex( { content: "text" } ) - -This ``text`` index catalogs all string data in the ``content`` field -where the ``content`` field contains a string or an array of string -elements. To index fields in sub-documents, you need to specify the -individual fields from the sub-documents using the :term:`dot -notation`. A ``text`` index can include multiple fields, as in the -following: - -.. code-block:: javascript - - db.collection.ensureIndex( { content: "text", - "users.comments": "text", - "users.profiles": "text" } ) - -The default name for the index consists of the ```` -concatenated with ``_text`` for the indexed fields, as in the following: - -.. code-block:: javascript +For more information on text indexes and examples, see +:ref:`create-text-index`. - "content_text_users.comments_text_users.profiles_text" - -These indexes may run into the :limit:`Index Name Length` limit. To -avoid creating an index with a too-long name, you can specify a name -in the options parameter, as in the following: - -.. code-block:: javascript - - db.collection.ensureIndex( { content: "text", - "users.profiles": "text" }, - { name: "TextIndex" } ) - -When creating ``text`` indexes you may specify *weights* for specific -fields. *Weights* are factored into the relevant score for each -document. The score for a given word in a document is the weighted sum -of the frequency for each of the indexed fields in that document. -Consider the following: - -.. code-block:: javascript - - db.collection.ensureIndex( { content: "text", - "users.profiles": "text" }, - { name: "TextIndex", - weights: { content: 1, - "users.profiles": 2 } } ) - -This example creates a ``text`` index on the top-level field named -``content`` and the ``profiles`` field in the ``users`` -sub-documents. Furthermore, the ``content`` field has a weight of 1 and -the ``users.profiles`` field has a weight of 2. - -You can add a conventional ascending or descending index field(s) as a -prefix or suffix of the index. You cannot include :ref:`multi-key -` index field nor :ref:`geospatial -` index field. - -If you create an ascending or descending index as a prefix of a -``text`` index: - -- MongoDB will only index documents that have the prefix field - (i.e. ``username``) and - -- The :dbcommand:`text` query can limit the number of index entries to - review in order to perform the query. - -- All :dbcommand:`text` queries using this index must include the - ``filter`` option that specifies an equality condition for the prefix - field or fields. - -Create this index with the following operation: - -.. code-block:: javascript - - db.collection.ensureIndex( { username: 1, - "users.profiles": "text" } ) - -Alternatively you create an ascending or descending index as a suffix -to a ``text`` index. Then the ``text`` index can support -:ref:`covered queries ` if the -:dbcommand:`text` command specifies a ``project`` option. - -Create this index with the following operation: - -.. code-block:: javascript - - db.collection.ensureIndex( { "users.profiles": "text", - username: 1 } ) - -Finally, you may use the special wild card field specifier (i.e. -``$**``) to specify index weights and fields. Consider the following -example that indexes any string value in the data of every field of -every document in a collection and names it ``TextIndex``: - -.. code-block:: javascript - - db.collection.ensureIndex( { "$**": "text", - username: 1 }, - { name: "TextIndex" } ) - -By default, an index field has a weight of ``1``. You may specify -weights for a ``text`` index with compound fields, as in the following: - -.. code-block:: javascript - - db.collection.ensureIndex( { content: "text", - "users.profiles": "text", - comments: "text", - keywords: "text", - about: "text" }, - { name: "TextIndex", - weights: - { content: 10, - "user.profiles": 2, - keywords: 5, - about: 5 } } ) - -This index, named ``TextIndex``, includes a number of fields, with the -following weights: - -- ``content`` field that has a weight of 10, -- ``users.profiles`` that has a weight of 2, -- ``comments`` that has a weight of 1, -- ``keywords`` that has a weight of 5, and -- ``about`` that has a weight of 5. - -This means that documents that match words in the ``content`` field -will appear in the result set more than all other fields in the index, -and that the ``user.profiles`` and ``comments`` fields will be less -likely to appear in responses than words from other fields. - -.. note:: +Text Command +```````````` - You must drop a ``text`` index using the name specified when you - created the index. Alternatively, if you did not specify a name - when creating the index, you can find the name using - :method:`db.collection.getIndexes()` +To query ``text`` indexes, you must use the :dbcommand:`text` command. +The :dbcommand:`text` command: -.. _text-index-specify-language: +- Is case-insensitive. -Specify Languages for Text Index -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +- Tokenizes and stems the search term(s) during the execution. -The default language associated with the indexed data determines the -list of stop words and the rules for the stemmer and tokenizer. The -default language for the indexed data is ``english``. +- For each tokenized/stemmed search term, assigns a score to each + document that contains the term in the indexed fields. The score + determines the relevance of a document to a given search query. -Use the ``default_language`` option when creating the ``text`` index to -specify a different language. See :ref:`text-search-languages`. +By default, the command returns the top 100 scoring documents in +descending order, but you can specify a different limit. -The following example creates a ``text`` index on the -``content`` field and sets the ``default_language`` to -``spanish``: +The following example returns documents from the ``articles`` +collection that contain the word ``coffee``, case-insensitive. .. code-block:: javascript - db.collection.ensureIndex( { content : "text" }, - { default_language: "spanish" } ) - -If a collection contains documents that are in different languages, the -individual documents can specify the language to use. - -- By default, if the documents in the collection contain a field named - ``language``, the value of the ``language`` field overrides the - default language. - - For example, the following document overrides the default language - ``spanish`` with ``portuguese``, the value in its ``language`` field. - - .. code-block:: javascript - - { content: "A sorte protege os audazes", language: "portuguese" } - -- To use a different field to override the default language, specify the - field with the ``language_override`` option when creating the index. - - For example, if the documents contain the field named ``myLanguage`` - instead of ``language``, create the ``text`` index with the - ``language_override`` option. - - .. code-block:: javascript - - db.collection.ensureIndex( { content : "text" }, - { language_override: "myLanguage" } ) - -.. .. note:: -.. If you specify a ``default_language`` of ``"none"``, or the override - language is ``"none"``, the :dbcommand:`text` command will not stem - the words. The command will also consider all words, i.e., it will not - drop the stop words. - -Text Queries -^^^^^^^^^^^^ - -MongoDB 2.3.2 introduces the :dbcommand:`text` command to provide -query support for ``text`` indexes. Unlike normal MongoDB queries, -:dbcommand:`text` returns a document rather than a -cursor. - -.. dbcommand:: text - - The :dbcommand:`text` provides an interface to search text context - stored in the ``text`` index. Consider the following prototype: - :dbcommand:`text`: - - .. code-block:: javascript - - db.collection.runCommand( "text", { search: , - filter: , - project: , - limit: , - language: } ) - - The :dbcommand:`text` command has the following parameters: - - :param string search: - - A text string that MongoDB stems and uses to query the ``text`` - index. In the :program:`mongo` shell, to specify a phrase to - match, you can either: - - - enclose the phrase in *escaped* double quotes and use double - quotes to specify the ``search`` string, as in ``"\"coffee - table\""``, or - - - enclose the phrase in double quotes and use *single* quotes to - specify the ``search`` string, as in ``'"coffee table"'`` - - :param document filter: - - Optional. A :ref:`query document ` to - further limit the results of the query using another database - field. You can use any valid MongoDB query in the filter - document, except if the index includes an ascending or descending - index field as a prefix. - - If the index includes an ascending or descending index field as a - prefix, the ``filter`` is required and the ``filter`` query must be - an equality match. - - :param document project: - - Optional. Allows you to limit the fields returned by the query - to only those specified. - - :param number limit: - - Optional. Specify the maximum number of documents to include in - the response. The :dbcommand:`text` sorts the results before - applying the ``limit``. - - The default limit is 100. - - :param string language: - - Optional. Specify, for the search, the language that determines - the list of stop words and the rules for the stemmer and - tokenizer. The default language is the value of the - ``default_language`` field specified during the index creation. - See :ref:`text-search-languages` for the supported languages. - - :return: - - :dbcommand:`text` returns results, in descending order by score, - in the form of a document. Results must fit within the - :limit:`BSON Document Size`. Use the ``limit`` and the - ``project`` parameters to limit the size of the result set. - - The implicit connector between the terms of a multi-term search is a - disjunction (``OR``). Search for ``"first second"`` searches - for ``"first"`` or ``"second"``. The scoring system will prefer - documents that contain all terms. - - However, consider the following behaviors of :dbcommand:`text` - queries: - - - With phrases (i.e. terms enclosed in escaped quotes), the search - performs an ``AND`` with any other terms in the search string; - e.g. search for ``"\"twinkle twinkle\" little star"`` searches for - ``"twinkle twinkle"`` and (``"little"`` or ``"star"``). - - - :dbcommand:`text` adds all negations to the query with the - logical ``AND`` operator. - -.. example:: - - Consider the following examples of :dbcommand:`text` queries. All - examples assume that you have a ``text`` index on the field named - ``content`` in a collection named ``collection``. - - #. Create a ``text`` index on the ``content`` field to enable text - search on the field: - - .. code-block:: javascript - - db.collection.ensureIndex( { content: "text" } ) - - #. Search for a single word ``coffee``: - - .. code-block:: javascript - - db.collection.runCommand( "text", { search: "coffee" } ) - - This query returns documents that contain the word ``coffee``, - case-insensitive, in the ``content`` field. - - #. Search for multiple words, ``bake`` or ``coffee`` or ``cake``: - - .. code-block:: javascript - - db.collection.runCommand( "text", { search: "bake coffee cake" } ) - - This query returns documents that contain the either ``bake`` - **or** ``coffee`` **or** ``cake`` in the ``content`` field. - - #. Search for the exact phrase ``bake coffee cake``: - - .. code-block:: javascript - - db.collection.runCommand( "text", { search: "\"bake coffee cake\"" } ) - - This query returns documents that contain the exact phrase - ``bake coffee cake``. - - #. Search for documents that contain the words ``bake`` or ``coffee``, - but **not** ``cake``: - - .. code-block:: javascript - - db.collection.runCommand( "text", { search: "bake coffee -cake" } ) - - Use the ``-`` as a prefix to terms to specify negation in the - search string. The query returns documents that contain the - either ``bake`` **or** ``coffee``, but **not** ``cake``, all - case-insensitive, in the ``content`` field. Prefixing a word - with a hyphen (``-``) negates a word: - - - The negated word filters out documents from the result set, - after selecting documents. - - - A ```` that only contains negative words returns no match. - - - A hyphenated word, such as ``case-insensitive``, is not a - negation. The :dbcommand:`text` command treats the hyphen as a - delimiter. - - #. Search for a single word ``coffee`` with an additional ``filter`` on - the ``about`` field, but **limit** the results to 2 documents with the - highest score and return only the ``comments`` field in the matching - documents: - - .. code-block:: javascript - - db.collection.runCommand( "text", { - search: "coffee", - filter: { about: /desserts/ }, - limit: 2, - project: { comments: 1, _id: 0 } - } - ) - - - The ``filter`` :ref:`query document ` - may use any of the available :doc:`query operators - `. - - - Because the ``_id`` field is implicitly included, in order to - return **only** the ``comments`` field, you must explicitly - exclude (``0``) the ``_id`` field. Within the ``project`` - document, you cannot mix inclusions (i.e. ``: 1``) and - exclusions (i.e. ``: 0``), except for the ``_id`` field. - -.. _text-search-languages: - -Languages Supported in Text Search -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The ``text`` index and the :dbcommand:`text` command support the -following languages: - -- ``danish`` - -- ``dutch`` - -- ``english`` - -- ``finnish`` - -- ``french`` - -- ``german`` - -- ``hungarian`` - -- ``italian`` - -- ``norwegian`` - -- ``portuguese`` - -- ``romanian`` - -- ``russian`` - -- ``spanish`` - -- ``swedish`` - -- ``turkish`` + db.articles.runCommand( "text", { search: "coffee" } ) .. _kerberos-authentication: @@ -654,7 +247,7 @@ Improved Concurrency Previously, MongoDB operations that required the JavaScript interpreter had to acquire a lock, and a single :program:`mongod` could only run a single JavaScript operation at a time. The switch to V8 improves -concurrency by permiting multiple JavaScript operations to run at the +concurrency by permitting multiple JavaScript operations to run at the same time. Modernized JavaScript Implementation (ES5) diff --git a/source/tutorial.txt b/source/tutorial.txt index 327fbea8e4f..f9b7bee13b2 100644 --- a/source/tutorial.txt +++ b/source/tutorial.txt @@ -74,6 +74,20 @@ Application Development - :doc:`/tutorial/write-a-tumblelog-application-with-django-mongodb-engine` - :doc:`/tutorial/write-a-tumblelog-application-with-flask-mongoengine` +.. index:: tutorials; text search +.. index:: text search tutorials +.. _tutorials-text-search: +.. _tutorial-text-search: + +Text Search Patterns +-------------------- + +- :doc:`/tutorial/enable-text-search` +- :doc:`/tutorial/search-for-text` +- :doc:`/tutorial/create-text-index-on-multi-language-collection` +- :doc:`/tutorial/return-text-queries-using-only-text-index` +- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search` + Data Modeling Patterns ---------------------- diff --git a/source/tutorial/create-text-index-on-multi-language-collection.txt b/source/tutorial/create-text-index-on-multi-language-collection.txt new file mode 100644 index 00000000000..c9be86d88ae --- /dev/null +++ b/source/tutorial/create-text-index-on-multi-language-collection.txt @@ -0,0 +1,97 @@ +====================================================== +Create a ``text`` Index on a Multi-language Collection +====================================================== + +.. default-domain:: mongodb + +Specify the Index Language within the Document +---------------------------------------------- + +If a collection contains documents that are in different languages, +include a field in the documents that contain the language to use: + +- If you include a field named ``language`` in the document, by + default, the :dbcommand:`db.collection.ensureIndex()` method will use + the value of this field to override the default language. + +- To use a field with a name other than ``language``, you must specify + the name of this field to the + :dbcommand:`db.collection.ensureIndex()` method with the + ``language_override`` option. + +See :ref:`text-search-languages` for a list of supported languages. + +Include the ``language`` Field +------------------------------ + +Include a field ``language`` that specifies the language to use for the +individual documents. + +For example, the documents of a multi-language collection ``quotes`` +contain the field ``language``: + +.. code-block:: javascript + + { _id: 1, language: "portuguese", quote: "A sorte protege os audazes" } + { _id: 2, language: "spanish", quote: "Nada hay más surreal que la realidad." } + { _id: 3, language: "english", quote: "is this a dagger which I see before me" } + +Create a ``text`` index on the field ``quote``: + +.. code-block:: javascript + + db.quotes.ensureIndex( { quote: "text" } ) + +- For the documents that contain the ``language`` field, the ``text`` + index uses that language to determine the stop words and the rules + for the stemmer and the tokenizer. + +- For documents that do not contain the ``language`` field, the index + uses the default language, which is English, to determine the stop + words and rules for the stemmer and the tokenizer. + +For example, the Spanish word ``que`` is a stop word. So the +following :dbcommand:`text` command would not match any document: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "que", language: "spanish" } ) + +Use any Field to Specify the Language for a Document +---------------------------------------------------- + +Include a field that specifies the language to use for the individual +documents. To use a field with a name other than ``language``, include +the ``language_override`` option when creating the index. + +For example, the documents of a multi-language collection ``quotes`` +contain the field ``idioma``: + +.. code-block:: javascript + + { _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" } + { _id: 2, idioma: "spanish", quote: "Nada hay más surreal que la realidad." } + { _id: 3, idioma: "english", quote: "is this a dagger which I see before me" } + +Create a ``text`` index on the field ``quote`` with the +``language_override`` option: + +.. code-block:: javascript + + db.quotes.ensureIndex( { quote : "text" }, + { language_override: "idioma" } ) + +- For the documents that contain the ``idioma`` field, the ``text`` + index uses that language to determine the stop words and the rules + for the stemmer and the tokenizer. + +- For documents that do not contain the ``idioma`` field, the index + uses the default language, which is English, to determine the stop + words and rules for the stemmer and the tokenizer. + +For example, the Spanish word ``que`` is a stop word. So the +following :dbcommand:`text` command would not match any document: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "que", language: "spanish" } ) diff --git a/source/tutorial/enable-text-search.txt b/source/tutorial/enable-text-search.txt new file mode 100644 index 00000000000..96b89cf9e02 --- /dev/null +++ b/source/tutorial/enable-text-search.txt @@ -0,0 +1,32 @@ +================== +Enable Text Search +================== + +.. default-domain:: mongodb + +.. versionadded:: 2.4 + +.. include:: /includes/fact-text-search-beta.rst + +.. include:: /includes/warning-text-search-not-for-production.rst + +You can enable the text search feature at startup with the +:parameter:`textSearchEnabled` parameter: + +.. code-block:: sh + + mongod --setParameter textSearchEnabled=true + +You may prefer to set the :setting:`textSearchEnabled` parameter in the +:doc:`configuration file `. + +Additionally, you can enable the feature in the :program:`mongo` shell +with the :dbcommand:`setParameter` command. This command does **not** +propagate from the primary to the secondaries. You must enable on +**each and every** :program:`mongod` for replica sets. + +.. note:: + + You must set the parameter every time you start the server. You may + prefer to add the parameter to the :doc:`configuration files + `. diff --git a/source/tutorial/limit-number-of-items-scanned-for-text-search.txt b/source/tutorial/limit-number-of-items-scanned-for-text-search.txt new file mode 100644 index 00000000000..8a34f28b93d --- /dev/null +++ b/source/tutorial/limit-number-of-items-scanned-for-text-search.txt @@ -0,0 +1,109 @@ +========================================================= +Limit the Number of Index Entries Scanned for Text Search +========================================================= + +.. default-domain:: mongodb + +The :dbcommand:`text` command includes the ``filter`` option to further +restrict the results of a text search. For a ``filter`` that specifies +equality conditions, this tutorial demonstrates how to perform text +searches on only those documents that match the ``filter`` conditions, +as opposed to performing a text search first on all the documents and +then matching on the ``filter`` condition. + +Consider a collection ``inventory`` that contains the following +documents: + +.. code-block:: javascript + + { _id: 1, dept: "tech", description: "a fun green computer" } + { _id: 2, dept: "tech", description: "a wireless red mouse" } + { _id: 3, dept: "kitchen", description: "a green placemat" } + { _id: 4, dept: "kitchen", description: "a red peeler" } + { _id: 5, dept: "food", description: "a green apple" } + { _id: 6, dept: "food", description: "a red potato" } + +A common use case is to perform text searches by individual +departments, such as: + +.. code-block:: javascript + + db.inventory.runCommand( "text", { + search: "green", + filter: { dept : "kitchen" } + } + ) + +To limit the text search to scan only those documents within a specific +``dept``, create a compound index that specifies an +ascending/descending index key on the field ``dept`` and a ``text`` +index key on the field ``description``: + +.. code-block:: javascript + + db.inventory.ensureIndex( + { + dept: 1, + description: "text" + } + ) + +.. important:: + + - The ascending/descending index keys must be listed before, or + prefix, the ``text`` index keys. + + - By prefixing the ``text`` index fields with ascending/descending + index fields, MongoDB will **only** index documents that have the + prefix fields. + + - You cannot include :ref:`multi-key ` index + fields or :ref:`geospatial ` index + fields. + + - The :dbcommand:`text` command **must** include the ``filter`` + option that specifies an **equality** condition for the prefix + fields. + +Then, the text search within a particular department will limit the +scan of indexed documents. For example, the following :dbcommand:`text` +command scans only those documents with ``dept`` equal to ``kitchen``: + +.. code-block:: javascript + + db.inventory.runCommand( "text", { + search: "green", + filter: { dept : "kitchen" } + } + ) + +The returned result includes the statistics that shows that the command +scanned 1 document, as indicated by the ``nscanned`` field: + +.. code-block:: javascript + + { + + "queryDebugString" : "green||||||", + "language" : "english", + "results" : [ + { + "score" : 0.75, + "obj" : { + "_id" : 3, + "dept" : "kitchen", + "description" : "a green placemat" + } + } + ], + "stats" : { + "nscanned" : 1, + "nscannedObjects" : 0, + "n" : 1, + "nfound" : 1, + "timeMicros" : 211 + }, + "ok" : 1 + } + +For more information on the result set, see :ref:`text-search-output`. diff --git a/source/tutorial/model-data-for-keyword-search.txt b/source/tutorial/model-data-for-keyword-search.txt index 3d4ca8262a6..79d93f8d4ec 100644 --- a/source/tutorial/model-data-for-keyword-search.txt +++ b/source/tutorial/model-data-for-keyword-search.txt @@ -4,6 +4,16 @@ Model Data to Support Keyword Search .. default-domain:: mongodb +.. note:: + + Keyword search is *not* the same as text search or full text + search, and does not provide stemming or other text-processing + features. See the :ref:`limit-keyword-indexes` section for more + information. + + In 2.4, MongoDB provides a text search feature. See + :doc:`/applications/text-search` for more information. + If your application needs to perform queries on the content of a field that holds text you can perform exact matches on the text or use :operator:`$regex` to use regular expression pattern matches. However, @@ -16,13 +26,6 @@ keywords stored in an array in the same document as the text field. Combined with a :ref:`multi-key index `, this pattern can support application's keyword search operations. -.. note:: - - Keyword search is *not* the same as text search or full text - search, and does not provide stemming or other text-processing - features. See the :ref:`limit-keyword-indexes` section for more - information. - Pattern ------- diff --git a/source/tutorial/return-text-queries-using-only-text-index.txt b/source/tutorial/return-text-queries-using-only-text-index.txt new file mode 100644 index 00000000000..0583c9fc6f3 --- /dev/null +++ b/source/tutorial/return-text-queries-using-only-text-index.txt @@ -0,0 +1,41 @@ +=============================================== +Return Text Queries Using Only a ``text`` Index +=============================================== + +.. default-domain:: mongodb + +To create a ``text`` index that can :ref:`cover queries +`: + +#. Append scalar index fields to a ``text`` index, as in the following + example which specifies an ascending index key on ``username``: + + .. code-block:: javascript + :emphasize-lines: 2 + + db.collection.ensureIndex( { comments: "text", + username: 1 } ) + + .. warning:: + + You cannot include :ref:`multi-key ` index + field or :ref:`geospatial ` index field. + +#. Use the ``project`` option in the :dbcommand:`text` to return only + the fields in the index, as in the following: + + .. code-block:: javascript + :emphasize-lines: 2-4 + + db.quotes.runCommand( "text", { search: "tomorrow", + project: { username: 1, + _id: 0 + } + } + ) + +.. note:: + + By default, the ``_id`` field is included in the result set. Since + the example index did not include the ``_id`` field, you must + explicitly exclude the field in the ``project`` document. diff --git a/source/tutorial/search-for-text.txt b/source/tutorial/search-for-text.txt new file mode 100644 index 00000000000..a0d8332bdc7 --- /dev/null +++ b/source/tutorial/search-for-text.txt @@ -0,0 +1,238 @@ +============================== +Search String Content for Text +============================== + +.. default-domain:: mongodb + +In 2.4, you can enable the text search feature to create ``text`` +indexes and issue text queries using the :dbcommand:`text`. + +The following tutorial offers various query patterns for using the text +search feature. + +The examples in this tutorial use a collection ``quotes`` that has a +``text`` index on the fields ``quote`` that contains a string and +``related_quotes`` that contains an array of string elements. + +Search for a Term +----------------- + +The following command searches for the word ``TOMORROW``: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "TOMORROW" } ) + +Because :dbcommand:`text` command is case-insensitive, the text search +will match the following document in the ``quotes`` collection: + +.. code-block:: javascript + + { + "_id" : ObjectId("50ecef5f8abea0fda30ceab3"), + "quote" : "tomorrow, and tomorrow, and tomorrow, creeps in this petty pace", + "related_quotes" : [ + "is this a dagger which I see before me", + "the handle toward my hand?" + ], + "src" : { + "title" : "Macbeth", + "from" : "Act V, Scene V" + }, + "speaker" : "macbeth" + } + +Match Any of the Search Terms +----------------------------- + +If the search string is a space-delimited text, :dbcommand:`text` +command performs a logical ``OR`` search on each term and returns +documents that contains any of the terms. + +For example, the search string ``"tomorrow largo"`` searches for the term +``tomorrow`` **OR** the term ``largo``: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "tomorrow largo" } ) + +The command will match the following documents in the ``quotes`` +collection: + +.. code-block:: javascript + + { + "_id" : ObjectId("50ecef5f8abea0fda30ceab3"), + "quote" : "tomorrow, and tomorrow, and tomorrow, creeps in this petty pace", + "related_quotes" : [ + "is this a dagger which I see before me", + "the handle toward my hand?" + ], + "src" : { + "title" : "Macbeth", + "from" : "Act V, Scene V" + }, + "speaker" : "macbeth" + } + + { + "_id" : ObjectId("50ecf0cd8abea0fda30ceab4"), + "quote" : "Es tan corto el amor y es tan largo el olvido.", + "related_quotes" : [ + "Como para acercarla mi mirada la busca.", + "Mi corazón la busca, y ella no está conmigo." + ], + "speaker" : "Pablo Neruda", + "src" : { + "title" : "Veinte poemas de amor y una canción desesperada", + "from" : "Poema 20" + } + } + +.. _text-search-phrases: + +Match Phrases +------------- + +To match the exact phrase that includes a space(s) as a single term, +escape the quotes. + +For example, the following command searches for the exact phrase ``"and +tomorrow"``: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "\"and tomorrow\"" } ) + +If the search string contains both phrase and individual terms, the +:dbcommand:`text` command performs a compound logical ``AND`` of the +phrase with the compound logical ``OR`` of the single terms + +For example, the following command contains a search string that +contains the individual terms ``corto`` and ``largo`` as well as the +phrase ``\"and tomorrow\"``: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "corto largo \"and tomorrow\"" } ) + +The :dbcommand:`text` command performs the equivalent to the following +logical operation: + +.. code-block:: javascript + + (corto OR largo OR tomorrow) AND ("and tomorrow") + +.. _text-search-negation: + +Match Some Words But Not Others +------------------------------- + +A *negated* term is a term that is prefixed by a minus sign ``-``. If +you negate a term, the :dbcommand:`text` command will exclude the +documents that contain those terms from the results. + +.. note:: + + If the search text contains *only* negated terms, the + :dbcommand:`text` command will not return any results. + +The following example returns those documents that contain the term +``tomorrow`` but **not** the term ``petty``. + +.. code-block:: javascript + + db.quotes.runCommand( "text" , { search: "tomorrow -petty" } ) + +.. _text-search-limit: + +Limit the Number of Matching Documents in the Result Set +-------------------------------------------------------- + +.. note:: + + The result from the :dbcommand:`text` command must fit within the + maximum :limit:`BSON Document Size`. + +By default, the :dbcommand:`text` command will return up to 100 +matching documents, from highest to lowest scores. To override this +default limit, use the ``limit`` option in the :dbcommand:`text` +command, as in the following example: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "tomorrow", limit: 2 } ) + +The :dbcommand:`text` command will return at most ``2`` of the +*highest scoring* results. + +The ``limit`` can be any number as long as the result set fits within +the maximum :limit:`BSON Document Size`. + +.. _text-search-project: + +Specify Which Fields to Return in the Result Set +------------------------------------------------ + +In the :dbcommand:`text` command, use the ``project`` option to specify +the fields to include (``1``) or exclude (``0``) in the matching +documents. + +.. note:: + + The ``_id`` field is always returned unless explicitly excluded in + the ``project`` document. + +The following example returns only the ``_id`` field and the ``src`` +field in the matching documents: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "tomorrow", + project: { "src": 1 } } ) + +.. _text-search-filter: + +Search with Additional Query Conditions +--------------------------------------- + +The :dbcommand:`text` command can also use the ``filter`` option to +specify additional query conditions. + +The following example will return the documents that contain the term +``tomorrow`` **AND** the ``speaker`` is ``macbeth``: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "tomorrow", + filter: { speaker : "macbeth" } } ) + +.. seealso:: + + :doc:`/tutorial/limit-number-of-items-scanned-for-text-search` + +.. _text-search-language: + +Search for Text in Specific Languages +------------------------------------- + +You can specify the language that determines the tokenization, +stemming, and removal of stop words, as in the following example: + +.. code-block:: javascript + + db.quotes.runCommand( "text", { search: "amor", language: "spanish" } ) + +.. seealso:: + + :doc:`/tutorial/create-text-index-on-multi-language-collection` + +See :ref:`text-search-languages` for a list of supported languages. + +Text Search Output +------------------ + +The :dbcommand:`text` command returns a document that contains the +result set. + +See :ref:`text-search-output` for information on the output.