-
Notifications
You must be signed in to change notification settings - Fork 1.7k
read operations document: needs an initial review #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,29 +4,236 @@ Read Operations | |
|
||
.. default-domain:: mongodb | ||
|
||
Synopsis | ||
-------- | ||
Read operations determine how MongoDB returns collection data when you issue a query. | ||
|
||
Queries | ||
------- | ||
This document describes how MongoDB performs read operations and how | ||
different factors affect the efficiency of reads. | ||
|
||
- :doc:`/reference/operators` | ||
- :method:`find <db.collection.find()>` | ||
- :dbcommand:`findOne` | ||
.. TODO intro and high-level read operations info | ||
|
||
.. For information about queries, see ???. | ||
|
||
.. index:: read operation; query | ||
.. index:: query; read operations | ||
.. _read-operations-query-operators: | ||
|
||
Query Operations | ||
---------------- | ||
|
||
Queries retrieve data from your database collections. How a query | ||
retrieves data is dependent on MongoDB read operations and on the | ||
indexes you have created. | ||
|
||
.. _read-operations-query-syntax: | ||
|
||
Query Syntax | ||
~~~~~~~~~~~~ | ||
|
||
For a list of query operators, see :doc:`/reference/operators`. | ||
|
||
.. TODO see the yet-to-be created query operations doc | ||
|
||
.. _read-operations-indexing: | ||
|
||
Indexes | ||
~~~~~~~ | ||
|
||
Indexes significantly reduce the amount of work needed for query read | ||
operations. Indexes record specified keys and key values and the disk | ||
locations of the documents containing those values. | ||
|
||
Indexes are typically stored in RAM *or* located sequentially on disk, | ||
and indexes are smaller than the documents they catalog. When a query | ||
can use an index, the read operation is significantly faster than when | ||
the query must scan all documents in a collection. | ||
|
||
MongoDB represents indexes internally as B-trees. | ||
|
||
The most selective indexes return the fastest results. The most | ||
selective index possible for a given query is an index for which all the | ||
documents that match the query criteria also match the entire query. | ||
|
||
.. example:: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. example should have "given a collection with the following documents..." and the example corresponding result set. |
||
|
||
Consider the following indexes, data, and query: | ||
|
||
Indexes: | ||
|
||
.. code-block:: javascript | ||
|
||
{ x:1 }, { y:1 } | ||
|
||
Data: | ||
|
||
.. code-block:: javascript | ||
|
||
{ x:1, y:2 } | ||
{ x:2, y:1 } | ||
{ x:3, y:0 } | ||
{ x:4, y:0 } | ||
|
||
Query: | ||
|
||
.. code-block:: javascript | ||
|
||
{ x:{ $gte:1 } , y:{ $gte:1} } | ||
|
||
The ``{ y:1 }`` index is more selective because all the documents | ||
that match the query's ``y`` key value also match the entire query. | ||
Conversely, not all the documents that match the query's ``x`` key | ||
value also match the entire query. | ||
|
||
.. seealso:: | ||
|
||
- The :doc:`/core/indexes` documentation, in particular :doc:`/applications/indexes` | ||
- :doc:`/reference/operators` | ||
- :method:`find <db.collection.find()>` | ||
- :method:`findOne` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. parens after metod names make them more clear. |
||
|
||
.. _read-operations-query-optimization: | ||
|
||
Query Optimization | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
MongoDB provides a query optimizer that matches a query to the index | ||
that performs the fastest read operation for that query. | ||
|
||
When you issue a query for the first time, the query optimizer runs the | ||
query against several indexes to find the most efficient. The optimizer | ||
then creates a "query plan" that specifies the index for future runs of | ||
the query. | ||
|
||
The MongoDB query optimizer deletes a query plan when a collection has | ||
changed to a point that the the specified index might no longer provide | ||
the fastest results. | ||
|
||
Query plans take advantage of MongoDB's indexing features. You should | ||
always write indexes that use the same fields and that sort in the same | ||
order as do your queries. For more information, see :doc:`/applications/indexes`. | ||
|
||
MongoDB creates a query plan as follows: When you run a query for which | ||
there is no query plan, either because the query is new or the old plan | ||
is obsolete, the query optimizer runs the query against several indexes | ||
at once in parallel but records the results in a single common buffer, | ||
as though the results all come from the same index. As each index yields | ||
a match, MongoDB records the match in the buffer. If an index returns a | ||
result already returned by another index, the optimizer recognizes the | ||
duplication and skips the duplicate match. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this paragraph is a bit dense. |
||
|
||
The optimizer determines a "winning" index when either of | ||
the following occur: | ||
|
||
- The optimizer exhausts an index, which means that the index has | ||
provided the full result set. At this point, the optimizer stops | ||
querying. | ||
|
||
- The optimizer reaches 101 results. At this point, the optimizer | ||
chooses the plan that has provided the most results *first* and | ||
continues reading only from that plan. Note that another index might | ||
have provided all those results as duplicates but because the | ||
"winning" index provided the full result set first, it is more | ||
efficient. | ||
|
||
The "winning" index now becomes the index specified in the query plan as | ||
the one to use the next time the query is run. | ||
|
||
To evaluate the optimizer's choice of query plan, run the query again | ||
with the :method:`explain() <cursor.explain()>` method and | ||
:method:`hint() <cursor.hint()>` methods appended. Instead of returning | ||
query results, this returns statistics about how the query runs. For example: | ||
|
||
.. code-block:: javascript | ||
|
||
db.people.find( { name:"John"} ).explain().hint() | ||
|
||
For details on the output, see :method:`explain() <cursor.explain()>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should have full explain output in some other form. |
||
|
||
.. note:: | ||
|
||
If you run :method:`explain() <cursor.explain()>` without including | ||
:method:`hint() <cursor.hint()>`, the query optimizer will | ||
re-evaluate the query and run against multiple indexes before | ||
returning the query statistics. Unless you want the optimizer to | ||
re-evaluate the query, do not leave off :method:`hint() | ||
<cursor.hint()>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually don't think this is such a bad thing, and doesn't require such a firm admonition. |
||
|
||
Because your collections will likely change over time, the query | ||
optimizer deletes a query plan and re-evaluates the indexes when any | ||
of the following occur: | ||
|
||
- The number of writes to the collection reaches 1,000. | ||
|
||
- You run the :dbcommand:`reIndex` command on the index. | ||
|
||
- You restart :program:`mongod`. | ||
|
||
When you re-evaluate a query, the optimizer will display the same | ||
results (assuming no data has changed) but might display the results in | ||
a different order, and the :method:`explain() <cursor.explain()>` method | ||
and :method:`hint() <cursor.hint()>` methods might result in different | ||
statistics. This is because the optimizer retrieves the results from | ||
several indexes at once during re-evaluation and the order in which | ||
results appear depends on the order of the indexes within the parallel | ||
querying. | ||
|
||
.. _read-operations-projection: | ||
|
||
Projection | ||
~~~~~~~~~~ | ||
|
||
A projection specifies which field values from an array a query should | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. any kind of document. |
||
return for matching documents. If you run a query *without* a | ||
projection, the query returns all fields and values for matching | ||
documents, which can add unnecessary network and deserialization costs. | ||
|
||
To run the most efficient queries, use the following projection | ||
operators when possible when querying on array values. For documentation | ||
on each operator, click the operator name: | ||
|
||
- :projection:`$elemMatch` | ||
|
||
- :projection:`$slice` | ||
|
||
.. _read-operations-aggregation: | ||
|
||
Aggregation | ||
----------- | ||
~~~~~~~~~~~ | ||
|
||
.. Probably short, but there's no docs for old-style aggregation so. | ||
|
||
.. - basic aggregation (count, distinct) | ||
.. - legacy agg: group | ||
.. - big things: mapreduce, aggregation | ||
|
||
.. seealso:: :doc:`/applications/aggregation` | ||
|
||
Indexing | ||
-------- | ||
.. index:: read operation; architecture | ||
.. _read-operations-architecture: | ||
|
||
Query Operators that Cannot Use Indexes | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Some query operators cannot take advantage of indexes and require a | ||
collection scan. When using these operators you can narrow the documents | ||
scanned by combining the operator with another operator that does use an | ||
index. | ||
|
||
Operators that cannot use indexes include the following: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. they can use the index, it's just ineffective. |
||
|
||
.. seealso:: :doc:`/core/indexes` | ||
- :operator:`$nin` | ||
|
||
- :operator:`$ne` | ||
|
||
.. TODO Regular expressions queries also do not use an index. | ||
.. TODO :method:`cursor.skip()` can cause paginating large numbers of docs | ||
|
||
Architecture | ||
------------ | ||
|
||
.. index:: read operation; connection pooling | ||
.. index:: connection pooling; read operations | ||
.. _read-operations-connection-pooling: | ||
|
||
Connection Pooling | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
|
@@ -35,3 +242,4 @@ Shard Clusters | |
|
||
Replica Sets | ||
~~~~~~~~~~~~ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a lot of index documentation already. The example below is good, but I think we can reduce some of the fore-matter.