Skip to content

DOCS-1147 and DOCS-1206 text index sharded cluster and replica sets and reorg #725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 15, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions source/applications.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,18 @@ The following documents provide patterns for developing application features:
tutorial/isolate-sequence-of-operations
tutorial/create-an-auto-incrementing-field
tutorial/expire-data

Text Search Patterns
--------------------

The following tutorials provide some patterns for
text search usage:

.. toctree::
:maxdepth: 1

tutorial/enable-text-search
tutorial/search-for-text
tutorial/create-text-index-on-multi-language-collection
tutorial/return-text-queries-using-only-text-index
tutorial/limit-number-of-items-scanned-for-text-search
111 changes: 111 additions & 0 deletions source/applications/text-search.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
===========
Text Search
===========

.. default-domain:: mongodb

.. versionadded:: 2.4

Overview
--------

Text search supports the search of string content in documents of a
collection. Text search introduces a new :ref:`text
<index-feature-text>` index type and a new :dbcommand:`text` command.

The text search process:

- tokenizes and stems the search term(s) during both the index creation
and the text command execution.

- assigns a score to each document that contains the search term in the
indexed fields. The score determines the relevance of a document to a
given search query.

By default, :dbcommand:`text` command returns at most the top 100
matching documents as determined by the scores.

.. _create-text-index:

Create a ``text`` Index
-----------------------

To perform text search, create a ``text`` index on the field or fields
whose value is a string or an array of string elements. To create a
``text`` indexes, use the :method:`db.collection.ensureIndex()` method
with a document that contains field and value pairs where the value is
the string literal ``text``.

.. important::

- Before you can :ref:`create a text index <create-text-index>` or
:ref:`run the text command <text-search-text-command>`, you need
to manually enable the text search. See
:doc:`/tutorial/enable-text-search` for information on how to
enable the text search feature.

- Text indexes have significant storage requirements and performance
costs. See :ref:`text index feature <index-feature-text>` for more
information.

- .. include:: /includes/fact-text-index-limit-one.rst

The following example creates a ``text`` index on the fields
``subject`` and ``content``:

.. code-block:: javascript

db.collection.ensureIndex(
{
subject: "text",
content: "text"
}
)

This ``text`` index catalogs all string data in the ``subject`` field
and the ``content`` field, where the field value is either a string or
an array of string elements.

See :doc:`/core/text-index` for details on the options available when
creating ``text`` indexes.

Additionally, ``text`` indexes can also be combined with
ascending/descending index fields. See:

- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search`

- :doc:`/tutorial/return-text-queries-using-only-text-index`

.. _text-search-text-command:

``text`` Command
----------------

The :dbcommand:`text` command can search for words and phrases. The
command matches on the complete stemmed words. For example, if a
document field contains the word ``blueberry``, a search on the term
``blue`` will not match the document. However, a search on either
``blueberry`` or ``blueberries`` will match.

By default, the :dbcommand:`text` returns the top 100 scoring documents
in descending order, but you can specify a ``limit`` option to change
the maximum number to return.

Given a collection with a ``text`` index, use the
:method:`~db.collection.runCommand()` method to execute the
:dbcommand:`text` command, as in:

.. code-block:: javascript

db.collection.runCommand( "text" , { search: <string> } )

For information and examples on various text search patterns, see
:doc:`/tutorial/search-for-text`.

Text Search Output
------------------

The :dbcommand:`text` command returns a document that contains the
result set.

See :ref:`text-search-output` for information on the output.
1 change: 1 addition & 0 deletions source/contents.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ MongoDB Manual Contents
security
crud
aggregation
applications/text-search
indexes
replication
sharding
Expand Down
46 changes: 46 additions & 0 deletions source/core/indexes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -770,6 +770,52 @@ indexes are not suited for finding the closest documents to a
particular location, when the closest documents are far away compared
to bucket size.

.. index:: index; text
.. index:: text index
.. _index-feature-text:

``text`` Indexes
~~~~~~~~~~~~~~~~

.. versionadded:: 2.4

MongoDB provides ``text`` indexes to support :doc:`text search
</applications/text-search>` on a collection. You can only access the
``text`` index with the :dbcommand:`text` command.

``text`` indexes are case-insensitive and can include any field that
contains string data. ``text`` indexes drop language-specific stop
words (e.g. in English, “the,” “an,” “a,” “and,” etc.) and uses simple
language-specific suffix stemming. See :ref:`text-search-languages` for
the supported languages.

``text`` indexes have the following storage requirements and
performance costs:

- Text indexes can be large. They contain one index entry for each
unique post-stemmed word in each indexed field for each document
inserted.

- Building a ``text`` index is very similar to building a large
multi-key index, and will take longer than building a simple ordered
(scalar) index on the same data.

- When building a large ``text`` index on an existing collection,
ensure that you have a sufficiently-high open file descriptor limit.
See the :ref:`recommended settings <oom-killer>`.

- ``text`` indexes will impact insertion throughput because MongoDB
must add an index entry for each unique post-stemmed word in each
indexed field of each new source document.

- Additionally, ``text`` indexes do not store phrases or information
about the proximity of words in the documents. As a result, phrase
queries will run much more effectively when the entire collection
fits in RAM.

See :doc:`/applications/text-search` for more information on the text
search feature.

.. index:: index; limitations
.. _index-limitations:

Expand Down
186 changes: 186 additions & 0 deletions source/core/text-index.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
:orphan:

==============
``text`` Index
==============

.. default-domain:: mongodb

This document provides details on some of the options available when
creating ``text`` indexes.

Specify a Name for the ``text`` Index
-------------------------------------

The default name for the index consists of each index field name
concatenated with ``_text``. Consider the ``text`` index on the fields
``content``, ``users.comments``, and ``users.profiles``.

.. code-block:: javascript

db.collection.ensureIndex(
{
content: "text",
"users.comments": "text",
"users.profiles": "text"
}
)

The default name for the index is:

.. code-block:: javascript

"content_text_users.comments_text_users.profiles_text"

To avoid creating an index with a name that exceeds the :limit:`index
name length limit <Index Name Length>`, you can pass the ``name``
option to the :method:`db.collection.ensureIndex()` method:

.. code-block:: javascript

db.collection.ensureIndex(
{
content: "text",
"users.comments": "text",
"users.profiles": "text"
},
{
name: "MyTextIndex"
}
)

.. note::

To drop the ``text`` index, use the index name. To get the name of
an index, use :method:`db.collection.getIndexes()`.

Index All Fields
----------------

To allow for text search on all fields with string content, use the
wildcard specifier (``$**``) to index all fields that contain string
content.

The following example indexes any string value in the data of every
field of every document in a collection and names it ``TextIndex``:

.. code-block:: javascript

db.collection.ensureIndex(
{ "$**": "text" },
{ name: "TextIndex" }
)

.. _text-index-default-language:

Specify Languages for Text Index
--------------------------------

The default language associated with the indexed data determines the
list of stop words and the rules for the stemmer and tokenizer. The
default language for the indexed data is ``english``.

To specify a different language, use the ``default_language`` option
when creating the ``text`` index. See :ref:`text-search-languages` for
the languages available for ``default_language``.

The following example creates a ``text`` index on the
``content`` field and sets the ``default_language`` to
``spanish``:

.. code-block:: javascript

db.collection.ensureIndex(
{ content : "text" },
{ default_language: "spanish" }
)

.. seealso::

:doc:`/tutorial/create-text-index-on-multi-language-collection`

.. _text-index-internals-weights:

Control Results of Text Search with Weights
-------------------------------------------

By default, the :dbcommand:`text` command returns matching documents
based on scores, from highest to lowest. For a ``text`` index, the
*weight* of an indexed field denote the significance of the field
relative to the other indexed fields in terms of the score. The score
calculation for a given word in a document includes the weighted sum of
the frequency for each of the indexed fields in that document.

The default weight is 1 for the indexed fields. To adjust the weights
for the indexed fields, include the ``weights`` option in the
:method:`db.collection.ensureIndex()` method.

.. warning::

Choose the weights carefully in order to prevent the need to reindex.

A collection ``blog`` has the following documents:

.. code-block:: javascript

{ _id: 1,
content: "This morning I had a cup of coffee.",
about: "beverage",
keywords: [ "coffee" ]
}

{ _id: 2,
content: "Who doesn't like cake?",
about: "food",
keywords: [ "cake", "food", "dessert" ]
}

To create a ``text`` index with different field weights for the
``content`` field and the ``keywords`` field, include the ``weights``
option to the :method:`~db.collection.ensureIndex()` method.

.. code-block:: javascript

db.blog.ensureIndex(
{
content: "text",
keywords: "text",
about: "text"
},
{
weights: {
content: 10,
keywords: 5,
},
name: "TextIndex"
}
)

The ``text`` index has the following fields and weights:

- ``content`` has a weight of 10,

- ``keywords`` has a weight of 5, and

- ``about`` has the default weight of 1.

These weights denote the relative significance of the indexed fields to
each other. For instance, a term match in the ``content`` field has:

- ``2`` times (i.e. ``10:5``) the impact as a term match in the
``keywords`` field and

- ``10`` times (i.e. ``10:1``) the impact as a term match in the
``about`` field.

Tutorials
---------

The following tutorials offer additional ``text`` index creation
patterns:

- :doc:`/tutorial/create-text-index-on-multi-language-collection`

- :doc:`/tutorial/limit-number-of-items-scanned-for-text-search`

- :doc:`/tutorial/return-text-queries-using-only-text-index`
1 change: 1 addition & 0 deletions source/includes/fact-text-index-limit-one.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A collection can have at most only **one** ``text`` index.
10 changes: 10 additions & 0 deletions source/includes/fact-text-search-beta.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
The :doc:`text search </applications/text-search>` is currently a
*beta* feature. As a beta feature:

- You need to explicitly enable the feature before :ref:`creating a text
index <create-text-index>` or using the :dbcommand:`text` command.

- To enable text search on :doc:`replica sets </core/replication>` and
:doc:`sharded clusters </core/sharded-clusters>`, you need to
enable on **each and every** :program:`mongod` for replica
sets and on **each and every** :program:`mongos` for sharded clusters.
Loading