diff --git a/source/applications/aggregation.txt b/source/applications/aggregation.txt index 5b02ebbab65..d443ff46292 100644 --- a/source/applications/aggregation.txt +++ b/source/applications/aggregation.txt @@ -2,7 +2,7 @@ Aggregation Framework ===================== -.. versionadded:: 2.1.0 +.. versionadded:: 2.1 .. default-domain:: mongodb @@ -10,8 +10,8 @@ Overview -------- The MongoDB aggregation framework provides a means to calculate -aggregate values without having to use :term:`map-reduce`. While -map-reduce is powerful, using map-reduce is more difficult than +aggregated values without having to use :term:`map-reduce`. While +map-reduce is powerful, it is often more difficult than necessary for many simple aggregation tasks, such as totaling or averaging field values. @@ -19,9 +19,9 @@ If you're familiar with :term:`SQL`, the aggregation framework provides similar functionality to ``GROUP BY`` and related SQL operators as well as simple forms of "self joins." Additionally, the aggregation framework provides projection capabilities to reshape the -returned data. Using projections and aggregation, you can add computed -fields, create new virtual sub-objects, and extract sub-fields into -the top-level of results. +returned data. Using the projections in the aggregation framework, you +can add computed fields, create new virtual sub-objects, and extract +sub-fields into the top-level of results. .. seealso:: A presentation from MongoSV 2011: `MongoDB's New Aggregation Framework `_ @@ -61,11 +61,7 @@ through the pipeline. input document: operators may also generate new documents or filter out documents. -.. warning:: - - The pipeline cannot operate on collections with documents that contain - one of the following "special fields:" ``MinKey``, ``MaxKey``, - ``EOO``, ``Undefined``, ``DBRef``, ``Code``. +.. include:: /includes/warning-aggregation-types.rst .. seealso:: The ":doc:`/reference/aggregation`" reference includes documentation of the following pipeline operators: @@ -78,28 +74,24 @@ through the pipeline. - :agg:pipeline:`$group` - :agg:pipeline:`$sort` - .. - :agg:pipeline:`$out` - .. _aggregation-expressions: Expressions ~~~~~~~~~~~ -Expressions calculate values from documents as they pass through the -pipeline and collect these results with calculated values from the -other documents that have flowed through the pipeline. The -aggregation framework defines expressions in a :term:`JSON` like format using -prefixes. +:ref:`Expressions ` produce output +document based on calculations performed on input documents. The +aggregation framework defines expressions using a document format +using prefixes. -Often, expressions are stateless and are only evaluated when seen by -the aggregation process. Stateless expressions perform operations such -as adding the values of two fields together or extracting the year -from a date. +Expressions are stateless and are only evaluated when seen by the +aggregation process. All aggregation expressions can only operate on +the current document, in the pipeline, and cannot integrate data from +other documents. -The :term:`accumulator` expressions *do* retain state, and the -:agg:pipeline:`$group` operator maintains that state (e.g. -totals, maximums, minimums, and related data.) as documents progress -through the :term:`pipeline`. +The :term:`accumulator` expressions used in the :agg:pipeline:`$group` +operator maintain that state (e.g. totals, maximums, minimums, and +related data) as documents progress through the :term:`pipeline`. .. seealso:: :ref:`Aggregation expressions ` for additional examples of the @@ -111,17 +103,15 @@ Use Invocation ~~~~~~~~~~ -Invoke an :term:`aggregation` operation with the :func:`aggregate` +Invoke an :term:`aggregation` operation with the :func:`aggregate() ` wrapper in the :program:`mongo` shell or the :dbcommand:`aggregate` -:term:`database command`. Always call :func:`aggregate` on a -collection object, which will determine the documents that contribute -to the beginning of the aggregation :term:`pipeline`. The arguments to -the :func:`aggregate` function specifies a sequence of :ref:`pipeline +:term:`database command`. Always call :func:`aggregate() ` on a +collection object that determines the input documents to the aggregation :term:`pipeline`. +The arguments to the :func:`aggregate() ` function specifies a sequence of :ref:`pipeline operators `, where each -:ref:`pipeline operator ` may -have a number of operands. +operator may have a number of operands. -First, consider a :term:`collection` of documents named ``article`` +First, consider a :term:`collection` of documents named ``articles`` using the following format: .. code-block:: javascript @@ -146,7 +136,7 @@ command: .. code-block:: javascript - db.article.aggregate( + db.articles.aggregate( { $project : { author : 1, tags : 1, @@ -158,12 +148,12 @@ command: } } ); -This operation uses the :func:`aggregate` wrapper around the -:term:`database command` :dbcommand:`aggregate`. The aggregation -pipleine begins with the :term:`collection` ``article`` and selects -the ``author`` and ``tags`` fields using the :agg:pipeline:`$project` -aggregation operator, and runs the :agg:expression:`$unwind` and -:agg:expression:`$group` on these fields to pivot the data. +The aggregation pipeline begins with the :term:`collection` +``article`` and selects the ``author`` and ``tags`` fields using the +:agg:pipeline:`$project` aggregation operator, and runs the +:agg:expression:`$unwind` operator to produce one output document per +tag finally uses the :agg:expression:`$group` operator on these fields +to pivot the data. Result ~~~~~~ @@ -177,13 +167,7 @@ The aggregation operation in the previous section returns a if there was an error As a document, the result is subject to the current :ref:`BSON -Document size `. - -.. OMMITED: as $out will not be available in 2.2 -.. -.. If you expect the aggregation framework to return a larger result, -.. consider using the use the :agg:pipeline:`$out` pipeline operator to -.. write the output to a collection. +Document size `, which is 16 megabytes. Optimizing Performance ---------------------- @@ -277,3 +261,20 @@ values and then divides. .. [#match-sharding] If an early :agg:pipeline:`$match` can exclude shards through the use of the shard key in the predicate, then these operators are only pushed to the relevant shards. + +Limitations +----------- + +Aggregation operations with the :dbcommand:`aggregate` command have +the following limitations: + +- The pipeline cannot operate on values of the following types: + ``Binary``, ``Symbol``, ``MinKey``, ``MaxKey``, ``DBRef``, ``Code``, + ``CodeWScope``. + +- Output from the :term:`pipeline` can only contain 16 megabytes. If + your result set exceeds this limit, the :dbcommand:`aggregate` + produces an error. + +- If any single aggregation operation consumes more than 10 percent of + system RAM, the operation will produce an error. diff --git a/source/includes/warning-aggregation-type.rst b/source/includes/warning-aggregation-type.rst new file mode 100644 index 00000000000..69d3167d9bb --- /dev/null +++ b/source/includes/warning-aggregation-type.rst @@ -0,0 +1,5 @@ +.. warning:: + + The pipeline cannot operate on values of the following types: + ``Binary``, ``Symbol``, ``MinKey``, ``MaxKey``, ``DBRef``, + ``Code``, and ``CodeWScope``. diff --git a/source/reference/aggregation.txt b/source/reference/aggregation.txt index 44471d339cc..117a1fc5739 100644 --- a/source/reference/aggregation.txt +++ b/source/reference/aggregation.txt @@ -7,43 +7,44 @@ Aggregation Framework Operators .. default-domain:: agg The aggregation framework provides the ability to project, process, -and/or control the output of the query, without using ":term:`map-reduce`." +and/or control the output of the query, without using :term:`map-reduce`. Aggregation uses a syntax that resembles the same syntax and form as "regular" MongoDB database queries. These aggregation operations are all accessible by way of the -:mongodb:func:`aggregate()`. While all examples in this document use this -function, :mongodb:func:`aggregate()` is merely a wrapper around the -:term:`database command` :mongodb:dbcommand:`aggregate`. Therefore the -following prototype aggregate are equivalent: +:mongodb:func:`aggregate()` method. While all examples in this document use this +method, :mongodb:func:`aggregate()` is merely a wrapper around the +:term:`database command` :mongodb:dbcommand:`aggregate`. The +following prototype aggregation operations are equivalent: .. code-block:: javascript - db.people.aggregate( { [pipeline] } ) - db.runCommand( { aggregate: "people", { [pipeline] } } ) + db.people.aggregate( ) + db.people.aggregate( [] ) + db.runCommand( { aggregate: "people", pipeline: [] } ) -Both of these operations perform aggregation routines on the -collection named ``people``. ``[pipeline]`` is a placeholder for -the aggregation :term:`pipeline` definition. +These operations perform aggregation routines on the +collection named ``people``. ```` is a placeholder for the +aggregation :term:`pipeline` definition. :mongodb:func:`aggregate()` +accepts the stages of the pipeline (i.e. ````) as an array, +or as arguments to the method. This documentation provides an overview of all aggregation operators available for use in the aggregation pipeline as well as details regarding their use and behavior. -.. seealso:: ":doc:`/applications/aggregation`" and ":ref:`Aggregation - Framework Documentation Index `" for more - information on the aggregation functionality. +.. seealso:: :doc:`/applications/aggregation` overview, the + :ref:`Aggregation Framework Documentation Index + `, and the + :doc:`/tutorial/aggregation-examples` for more information on the + aggregation functionality. .. _aggregation-pipeline-operator-reference: Pipeline -------- -.. warning:: - - The pipeline cannot operate on collections with documents that contain - any of the following "special fields:" ``MinKey``, ``MaxKey``, - ``EOO``, ``Undefined``, ``DBRef``, ``Code``. +.. include:: /includes/warning-aggregation-types.rst Pipeline operators appear in an array. Conceptually, documents pass through these operators in a sequence. All examples in this section assume that the @@ -53,16 +54,16 @@ contains documents that resemble the following: .. code-block:: javascript { - title : "this is my title" , - author : "bob" , - posted : new Date() , - pageViews : 5 , - tags : [ "fun" , "good" , "fun" ] , - comments : [ - { author :"joe" , text : "this is cool" } , - { author :"sam" , text : "this is bad" } - ], - other : { foo : 5 } + title : "this is my title" , + author : "bob" , + posted : new Date() , + pageViews : 5 , + tags : [ "fun" , "good" , "fun" ] , + comments : [ + { author :"joe" , text : "this is cool" } , + { author :"sam" , text : "this is bad" } + ], + other : { foo : 5 } } The current pipeline operators are: @@ -70,11 +71,10 @@ The current pipeline operators are: .. pipeline:: $project Reshapes a document stream by renaming, adding, or removing - fields. Also use :pipeline:`$project` to create computed values - or sub-objects. Use :pipeline:`$project` to: + fields. Also use :pipeline:`$project` to create computed values or + sub-objects. Use :pipeline:`$project` to: - Include fields from the original document. - - Exclude fields from the original document. - Insert computed fields. - Rename fields. - Create and populate fields that hold sub-documents. @@ -94,14 +94,12 @@ The current pipeline operators are: This operation includes the ``title`` field and the ``author`` field in the document that returns from the aggregation - :term:`pipeline`. Because the first field specification is an - inclusion, :pipeline:`$project` is in "inclusive" mode, and will - return only the fields explicitly included (and the ``_id`` field.) + :term:`pipeline`. .. note:: - The ``_id`` field is always included by default in the inclusive - mode. You may explicitly exclude ``_id`` as follows: + The ``_id`` field is always included by default. You may + explicitly exclude ``_id`` as follows: .. code-block:: javascript @@ -116,39 +114,6 @@ The current pipeline operators are: Here, the projection excludes the ``_id`` field but includes the ``title`` and ``author`` fields. - .. warning:: - - In the inclusive mode, you may exclude *no* fields other than - the ``_id`` field. - - A field inclusion in a projection will not create a field that - does not exist in a document from the collection. - - In the exclusion mode, the :pipeline:`$project` returns all - fields *except* the ones that are explicitly excluded. Consider the - following example: - - .. code-block:: javascript - - db.article.aggregate( - { $project : { - comments : 0 , - other : 0 - }} - ); - - Here, the projection propagates all fields except for the - ``comments`` and ``other`` fields along the pipeline. - - The :pipeline:`$project` enters **exclusive** mode when the - first field in the projection (that isn't ``_id``) is an exclusion. - When the first field is an **inclusion** the projection is inclusive. - - .. note:: - - In exclusive mode, no fields may be explicitly included by - declaring them with a ``: 1`` in the projection statement. - Projections can also add computed fields to the document stream passing through the pipeline. A computed field can use any of the :ref:`expression operators `. @@ -169,9 +134,8 @@ The current pipeline operators are: .. note:: - You must enclose the expression that defines the computed field in - braces, so that it resembles an object and conforms to - JavaScript syntax. + You must enclose the expression that defines the computed field + in braces, so that the expression is a valid object. You may also use :pipeline:`$project` to rename fields. Consider the following example: @@ -186,7 +150,6 @@ The current pipeline operators are: }} ); - This operation renames the ``pageViews`` field to ``page_views``, and renames the ``foo`` field in the ``other`` sub-document as the top-level field ``bar``. The field references used for @@ -218,21 +181,14 @@ The current pipeline operators are: - ``pv`` which includes and renames the ``pageViews`` from the top level of the original documents. + - ``foo`` which includes the value of ``other.foo`` from the original documents. + - ``dpv`` which is a computed field that adds 10 to the value of the ``pageViews`` field in the original document using the :expression:`$add` aggregation expression. - .. note:: - - Because of the :term:`BSON` requirement to preserve field order, - projections output fields in the same order that they appeared in the - input. Furthermore, when the aggregation framework adds computed - values to a document, they will follow all fields from the - original and appear in the order that they appeared in the - :pipeline:`$project` statement. - .. pipeline:: $match Provides a query-like interface to filter documents out of the @@ -271,8 +227,6 @@ The current pipeline operators are: Here, all documents return when the ``score`` field holds a value that is greater than 50 and less than or equal to 90. - .. seealso:: :mongodb:operator:`$gt` and :mongodb:operator:`$lte`. - .. note:: Place the :pipeline:`$match` as early in the aggregation @@ -286,8 +240,9 @@ The current pipeline operators are: .. warning:: - You cannot use :mongodb:operator:`$where` operations in - :pipeline:`$match` queries as part of the aggregation pipeline. + You cannot use :mongodb:operator:`$where` or :ref:`geospatial + operations ` in :pipeline:`$match` + queries as part of the aggregation pipeline. .. pipeline:: $limit @@ -405,7 +360,7 @@ The current pipeline operators are: - If you specify a target field for :pipeline:`$unwind` that does not exist in an input document, the pipeline ignores the - input document, and will generates no result documents. + input document, and will generate no result documents. - If you specify a target field for :pipeline:`$unwind` that is not an array, :mongodb:func:`aggregate()` generates an error. @@ -428,6 +383,9 @@ The current pipeline operators are: a single field from the documents in the pipeline, a previously computed value, or an aggregate key made up from several incoming fields. + With the exception of the ``_id`` field, :pipeline:`$group` cannot + output nested documents. + Every group expression must specify an ``_id`` field. You may specify the ``_id`` field as a dotted field path reference, a document with multiple fields enclosed in @@ -463,21 +421,22 @@ The current pipeline operators are: Returns an array of all the values found in the selected field among the documents in that group. *Every unique value only - appears once* in the result set. + appears once* in the result set. There is no ordering guarantee + for the output documents. .. group:: $first - Returns the first value it sees for its group. + Returns the first value it encounters for its group . .. note:: - Only use :group:`$first` when the :pipeline:`$group` - follows an :pipeline:`$sort` operation. Otherwise, the - result of this operation is unpredictable. + Only use :group:`$first` when the :pipeline:`$group` follows + an :pipeline:`$sort` operation. Otherwise, the result of this + operation is unpredictable. .. group:: $last - Returns the last value it sees for its group. + Returns the last value it encounters for its group. .. note:: @@ -540,11 +499,9 @@ The current pipeline operators are: ````, according to the key and specification in the ``{ }`` document. - The sorting configuration is identical to the specification of an - :term:`index`. Within a document, specify a field or fields that - you want to sort by and a value of ``1`` or ``-1`` to specify - an ascending or descending sort respectively. See the following - example: + Specify the sort in a document with a field or fields that you want + to sort by and a value of ``1`` or ``-1`` to specify an ascending + or descending sort respectively, as in the following example: .. code-block:: javascript @@ -561,10 +518,15 @@ The current pipeline operators are: The :pipeline:`$sort` cannot begin sorting documents until previous operators in the pipeline have returned all output. + .. TODO mention the importance of order preserving objects + .. warning:: Unless the :pipeline:`$sort` operator can use an index, in the current release, the sort must fit within memory. This may cause problems when sorting large numbers of documents. + .. TODO if a sort precedes the first $group in a sharded system, + all documents must go to the mongos for sorting. + .. OMITTED: Pending SERVER-3254, $out will not be in 2.2. .. .. .. pipeline:: $out @@ -603,8 +565,8 @@ return Booleans as results. These operators convert non-booleans to Boolean values according to the BSON standards. Here, "Null," undefined, and "zero" values - become "false," while non-zero numeric values, strings, dates, - objects, and other types become "true." + become ``false``, while non-zero numeric values, and all other types, + such as strings, dates, objects become ``true``. .. expression:: $and @@ -614,13 +576,8 @@ return Booleans as results. .. note:: :expression:`$and` uses short-circuit logic: the operation - stops evaluation after encountering the first ``false`` expression. - -.. expression:: $not - - Returns the boolean opposite value passed to it. When passed a - ``true`` value, :expression:`$not` returns ``false``; when passed - a ``false`` value, :expression:`$not` returns ``true``. + stops evaluation after encountering the first ``false`` + expression. .. expression:: $or @@ -632,6 +589,12 @@ return Booleans as results. :expression:`$or` uses short-circuit logic: the operation stops evaluation after encountering the first ``true`` expression. +.. expression:: $not + + Returns the boolean opposite value passed to it. When passed a + ``true`` value, :expression:`$not` returns ``false``; when passed + a ``false`` value, :expression:`$not` returns ``true``. + Comparison Operators ~~~~~~~~~~~~~~~~~~~~ @@ -645,8 +608,8 @@ returns an integer. .. expression:: $cmp - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array and returns an integer. The returned + value is: - A negative number if the first value is less than the second. @@ -656,8 +619,8 @@ returns an integer. .. expression:: $eq - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns a boolean. The returned value is: + Takes two values in an array and returns a boolean. The returned + value is: - ``true`` when the values are equivalent. @@ -665,8 +628,8 @@ returns an integer. .. expression:: $gt - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array and returns an integer. The returned + value is: - ``true`` when the first value is *greater than* the second value. @@ -675,8 +638,8 @@ returns an integer. .. expression:: $gte - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array and returns an integer. The returned + value is: - ``true`` when the first value is *greater than or equal* to the second value. @@ -685,8 +648,8 @@ returns an integer. .. expression:: $lt - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array and returns an integer. The returned + value is: - ``true`` when the first value is *less than* the second value. @@ -695,8 +658,8 @@ returns an integer. .. expression:: $lte - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array and returns an integer. The returned + value is: - ``true`` when the first value is *less than or equal to* the second value. @@ -706,28 +669,23 @@ returns an integer. .. expression:: $ne - Takes two values in an array, either a pair of numbers, a pair of strings, - or a pair of dates, and returns an integer. The returned value is: + Takes two values in an array returns an integer. The returned value + is: - ``true`` when the values are **not equivalent**. - - ``false`` when the values are equivalent. + - ``false`` when the values are **equivalent**. Arithmetic Operators ~~~~~~~~~~~~~~~~~~~~ +These operators only support numbers. + .. expression:: $add Takes an array of numbers and adds them together, returning the sum. - - If the array contains a string, :expression:`$add` concatenates - all items and returns the result as a string. - - - If the array contains a date and no strings, :expression:`$add` - treats all numbers as a quantity of days and adds them to the - date. The result has the date type. - .. expression:: $divide Takes an array that contains a pair of numbers and returns the @@ -750,26 +708,20 @@ Arithmetic Operators Takes an array that contains a pair of numbers and subtracts the second from the first, returning their difference. - .. note:: - - If the first entry in the array is a date, - :expression:`$subtract` treats the second entry, a number, as a - number of days and decrements the date, returning the resulting - date. - - String Operators ~~~~~~~~~~~~~~~~ These operators manipulate strings within projection expressions. +.. TODO the following may get cut. + .. expression:: $strcasecmp - Takes in two strings. Returns a number, of JavaScript type "long." - :expression:`$strcasecmp` is positive if the first string is - "greater than" the second and negative if the first string is "less - than" the second. :expression:`$strcasecmp` returns 0 if the - strings are identical. + Takes in two strings. Returns a number. :expression:`$strcasecmp` + is positive if the first string is "greater than" the second and + negative if the first string is "less than" the + second. :expression:`$strcasecmp` returns 0 if the strings are + identical. .. note:: @@ -804,8 +756,6 @@ These operators manipulate strings within projection expressions. :expression:`$toUpper` may not make sense when applied to glyphs outside the Roman alphabet. -.. seealso:: ":expression:`$add`", which concatenates strings. - Date Operators ~~~~~~~~~~~~~~ @@ -821,7 +771,7 @@ argument and return a JavaScript "long" number. .. expression:: $dayOfWeek Takes a date and returns the day of the week as a number - between 1 and 7. + between 1 (Sunday) and 7 (Saturday.) .. expression:: $dayOfYear @@ -836,13 +786,14 @@ argument and return a JavaScript "long" number. Takes a date and returns the minute between 0 and 59. -.. expression:: $month +.. expression:: $second - Takes a date and returns the month as a number between 1 and 12. + Takes a date and returns the second between 0 and 59, but can be 60 + to account for leap seconds. -.. expression:: $second +.. expression:: $month - Takes a date and returns the second between 0 and 59. + Takes a date and returns the month as a number between 1 and 12. .. expression:: $week @@ -856,12 +807,11 @@ argument and return a JavaScript "long" number. .. expression:: $year - Takes a date and returns a four digit number. + Takes a date and returns the full year. .. seealso:: ":expression:`$add`" and ":expression:`$subtract` can also manipulate date objects. - Multi-Expressions ~~~~~~~~~~~~~~~~~ diff --git a/source/reference/commands.txt b/source/reference/commands.txt index f6be06e8ca1..c7b6cae252f 100644 --- a/source/reference/commands.txt +++ b/source/reference/commands.txt @@ -643,7 +643,7 @@ Aggregation .. code-block:: javascript - { aggregate: "[collection]", pipeline: ["pipeline"] } + { aggregate: "[collection]", pipeline: [pipeline] } Where ``[collection]`` specifies the name of the collection that contains the data that you wish to aggregate. The ``pipeline`` @@ -654,15 +654,15 @@ Aggregation .. code-block:: javascript db.runCommand( - { aggregate : “article”, pipeline : [ + { aggregate : "article", pipeline : [ { $project : { author : 1, tags : 1, } }, - { $unwind : “$tags” }, + { $unwind : "$tangs" }, { $group : { _id : { tags : 1 }, - authors : { $addToSet : “$author” } + authors : { $addToSet : "$author" } } } ] } ); @@ -678,10 +678,10 @@ Aggregation author : 1, tags : 1, } }, - { $unwind : “$tags” }, + { $unwind : "$tags" }, { $group : { _id : { tags : 1 }, - authors : { $addToSet : “$author” } + authors : { $addToSet : "$author" } } } ); diff --git a/source/reference/glossary.txt b/source/reference/glossary.txt index cb756d07ced..609bd5c511a 100644 --- a/source/reference/glossary.txt +++ b/source/reference/glossary.txt @@ -751,7 +751,8 @@ Glossary accumulator An :term:`expression` in the :term:`aggregation framework` that maintains state between documents in the :term:`aggregation` - :term:`pipeline`. + :term:`pipeline`. See: :agg:pipeline:`$group` for a list of + accumulator operations. CRUD Create, read, update, and delete. The fundamental operations