Skip to content

DOCS-332 aggregation update #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 17, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 49 additions & 48 deletions source/applications/aggregation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@
Aggregation Framework
=====================

.. versionadded:: 2.1.0
.. versionadded:: 2.1

.. default-domain:: mongodb

Overview
--------

The MongoDB aggregation framework provides a means to calculate
aggregate values without having to use :term:`map-reduce`. While
map-reduce is powerful, using map-reduce is more difficult than
aggregated values without having to use :term:`map-reduce`. While
map-reduce is powerful, it is often more difficult than
necessary for many simple aggregation tasks, such as totaling or
averaging field values.

If you're familiar with :term:`SQL`, the aggregation framework
provides similar functionality to ``GROUP BY`` and related SQL
operators as well as simple forms of "self joins." Additionally, the
aggregation framework provides projection capabilities to reshape the
returned data. Using projections and aggregation, you can add computed
fields, create new virtual sub-objects, and extract sub-fields into
the top-level of results.
returned data. Using the projections in the aggregation framework, you
can add computed fields, create new virtual sub-objects, and extract
sub-fields into the top-level of results.

.. seealso:: A presentation from MongoSV 2011: `MongoDB's New
Aggregation Framework <http://www.10gen.com/presentations/mongosv-2011/mongodbs-new-aggregation-framework>`_
Expand Down Expand Up @@ -61,11 +61,7 @@ through the pipeline.
input document: operators may also generate new documents or filter
out documents.

.. warning::

The pipeline cannot operate on collections with documents that contain
one of the following "special fields:" ``MinKey``, ``MaxKey``,
``EOO``, ``Undefined``, ``DBRef``, ``Code``.
.. include:: /includes/warning-aggregation-types.rst

.. seealso:: The ":doc:`/reference/aggregation`" reference includes
documentation of the following pipeline operators:
Expand All @@ -78,28 +74,24 @@ through the pipeline.
- :agg:pipeline:`$group`
- :agg:pipeline:`$sort`

.. - :agg:pipeline:`$out`

.. _aggregation-expressions:

Expressions
~~~~~~~~~~~

Expressions calculate values from documents as they pass through the
pipeline and collect these results with calculated values from the
other documents that have flowed through the pipeline. The
aggregation framework defines expressions in a :term:`JSON` like format using
prefixes.
:ref:`Expressions <aggregation-expression-operators>` produce output
document based on calculations performed on input documents. The
aggregation framework defines expressions using a document format
using prefixes.

Often, expressions are stateless and are only evaluated when seen by
the aggregation process. Stateless expressions perform operations such
as adding the values of two fields together or extracting the year
from a date.
Expressions are stateless and are only evaluated when seen by the
aggregation process. All aggregation expressions can only operate on
the current document, in the pipeline, and cannot integrate data from
other documents.

The :term:`accumulator` expressions *do* retain state, and the
:agg:pipeline:`$group` operator maintains that state (e.g.
totals, maximums, minimums, and related data.) as documents progress
through the :term:`pipeline`.
The :term:`accumulator` expressions used in the :agg:pipeline:`$group`
operator maintain that state (e.g. totals, maximums, minimums, and
related data) as documents progress through the :term:`pipeline`.

.. seealso:: :ref:`Aggregation expressions
<aggregation-expression-operators>` for additional examples of the
Expand All @@ -111,17 +103,15 @@ Use
Invocation
~~~~~~~~~~

Invoke an :term:`aggregation` operation with the :func:`aggregate`
Invoke an :term:`aggregation` operation with the :func:`aggregate() <db.collection.aggregate()>`
wrapper in the :program:`mongo` shell or the :dbcommand:`aggregate`
:term:`database command`. Always call :func:`aggregate` on a
collection object, which will determine the documents that contribute
to the beginning of the aggregation :term:`pipeline`. The arguments to
the :func:`aggregate` function specifies a sequence of :ref:`pipeline
:term:`database command`. Always call :func:`aggregate() <db.collection.aggregate()>` on a
collection object that determines the input documents to the aggregation :term:`pipeline`.
The arguments to the :func:`aggregate() <db.collection.aggregate()>` function specifies a sequence of :ref:`pipeline
operators <aggregation-pipeline-operator-reference>`, where each
:ref:`pipeline operator <aggregation-pipeline-operator-reference>` may
have a number of operands.
operator may have a number of operands.

First, consider a :term:`collection` of documents named ``article``
First, consider a :term:`collection` of documents named ``articles``
using the following format:

.. code-block:: javascript
Expand All @@ -146,7 +136,7 @@ command:

.. code-block:: javascript

db.article.aggregate(
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
Expand All @@ -158,12 +148,12 @@ command:
} }
);

This operation uses the :func:`aggregate` wrapper around the
:term:`database command` :dbcommand:`aggregate`. The aggregation
pipleine begins with the :term:`collection` ``article`` and selects
the ``author`` and ``tags`` fields using the :agg:pipeline:`$project`
aggregation operator, and runs the :agg:expression:`$unwind` and
:agg:expression:`$group` on these fields to pivot the data.
The aggregation pipeline begins with the :term:`collection`
``article`` and selects the ``author`` and ``tags`` fields using the
:agg:pipeline:`$project` aggregation operator, and runs the
:agg:expression:`$unwind` operator to produce one output document per
tag finally uses the :agg:expression:`$group` operator on these fields
to pivot the data.

Result
~~~~~~
Expand All @@ -177,13 +167,7 @@ The aggregation operation in the previous section returns a
if there was an error

As a document, the result is subject to the current :ref:`BSON
Document size <limit-bson-document-size>`.

.. OMMITED: as $out will not be available in 2.2
..
.. If you expect the aggregation framework to return a larger result,
.. consider using the use the :agg:pipeline:`$out` pipeline operator to
.. write the output to a collection.
Document size <limit-bson-document-size>`, which is 16 megabytes.

Optimizing Performance
----------------------
Expand Down Expand Up @@ -277,3 +261,20 @@ values and then divides.
.. [#match-sharding] If an early :agg:pipeline:`$match` can exclude
shards through the use of the shard key in the predicate, then
these operators are only pushed to the relevant shards.

Limitations
-----------

Aggregation operations with the :dbcommand:`aggregate` command have
the following limitations:

- The pipeline cannot operate on values of the following types:
``Binary``, ``Symbol``, ``MinKey``, ``MaxKey``, ``DBRef``, ``Code``,
``CodeWScope``.

- Output from the :term:`pipeline` can only contain 16 megabytes. If
your result set exceeds this limit, the :dbcommand:`aggregate`
produces an error.

- If any single aggregation operation consumes more than 10 percent of
system RAM, the operation will produce an error.
5 changes: 5 additions & 0 deletions source/includes/warning-aggregation-type.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. warning::

The pipeline cannot operate on values of the following types:
``Binary``, ``Symbol``, ``MinKey``, ``MaxKey``, ``DBRef``,
``Code``, and ``CodeWScope``.
Loading