|
1 |
| -To run an explain plan for this aggregation use |
2 |
| -`PyMongoExplain <https://pypi.org/project/pymongoexplain/>`_, |
3 |
| -a companion library for PyMongo. It allows you to explain any CRUD operation |
4 |
| -by providing a few convenience classes: |
| 1 | +.. _pymongo-aggregation: |
5 | 2 |
|
6 |
| -.. code-block:: python |
| 3 | +==================================== |
| 4 | +Transform Your Data with Aggregation |
| 5 | +==================================== |
| 6 | + |
| 7 | +.. facet:: |
| 8 | + :name: genre |
| 9 | + :values: reference |
| 10 | + |
| 11 | +.. meta:: |
| 12 | + :keywords: code example, transform, computed, pipeline |
| 13 | + :description: Learn how to use {+driver-short+} to perform aggregation operations. |
| 14 | + |
| 15 | +.. contents:: On this page |
| 16 | + :local: |
| 17 | + :backlinks: none |
| 18 | + :depth: 2 |
| 19 | + :class: singlecol |
| 20 | + |
| 21 | +.. toctree:: |
| 22 | + :titlesonly: |
| 23 | + :maxdepth: 1 |
| 24 | + |
| 25 | + /aggregation/aggregation-tutorials |
| 26 | + |
| 27 | +Overview |
| 28 | +-------- |
| 29 | + |
| 30 | +In this guide, you can learn how to use {+driver-short+} to perform |
| 31 | +**aggregation operations**. |
| 32 | + |
| 33 | +Aggregation operations process data in your MongoDB collections and |
| 34 | +return computed results. The MongoDB Aggregation framework, which is |
| 35 | +part of the Query API, is modeled on the concept of data processing |
| 36 | +pipelines. Documents enter a pipeline that contains one or more stages, |
| 37 | +and this pipeline transforms the documents into an aggregated result. |
| 38 | + |
| 39 | +An aggregation operation is similar to a car factory. A car factory has |
| 40 | +an assembly line, which contains assembly stations with specialized |
| 41 | +tools to do specific jobs, like drills and welders. Raw parts enter the |
| 42 | +factory, and then the assembly line transforms and assembles them into a |
| 43 | +finished product. |
| 44 | + |
| 45 | +The **aggregation pipeline** is the assembly line, **aggregation stages** are the |
| 46 | +assembly stations, and **operator expressions** are the |
| 47 | +specialized tools. |
| 48 | + |
| 49 | +Aggregation Versus Find Operations |
| 50 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 51 | + |
| 52 | +You can use find operations to perform the following actions: |
| 53 | + |
| 54 | +- Select which documents to return |
| 55 | +- Select which fields to return |
| 56 | +- Sort the results |
| 57 | + |
| 58 | +You can use aggregation operations to perform the following actions: |
| 59 | + |
| 60 | +- Perform find operations |
| 61 | +- Rename fields |
| 62 | +- Calculate fields |
| 63 | +- Summarize data |
| 64 | +- Group values |
| 65 | + |
| 66 | +Limitations |
| 67 | +~~~~~~~~~~~ |
7 | 68 |
|
8 |
| - >>> from pymongoexplain import ExplainableCollection |
9 |
| - >>> ExplainableCollection(collection).aggregate(pipeline) |
10 |
| - {'ok': 1.0, 'queryPlanner': [...]} |
| 69 | +Keep the following limitations in mind when using aggregation operations: |
11 | 70 |
|
12 |
| -Or, use the the ``~pymongo.database.Database.command`` method method: |
| 71 | +- Returned documents must not violate the |
| 72 | + :manual:`BSON document size limit </reference/limits/#mongodb-limit-BSON-Document-Size>` |
| 73 | + of 16 megabytes. |
| 74 | +- Pipeline stages have a memory limit of 100 megabytes by default. You can exceed this |
| 75 | + limit by using the ``allowDiskUse`` keyword argument of the |
| 76 | + ``aggregate()`` method. |
| 77 | + |
| 78 | +.. important:: $graphLookup exception |
| 79 | + |
| 80 | + The :manual:`$graphLookup |
| 81 | + </reference/operator/aggregation/graphLookup/>` stage has a strict |
| 82 | + memory limit of 100 megabytes and ignores the ``allowDiskUse`` parameter. |
| 83 | + |
| 84 | +Aggregation Example |
| 85 | +------------------- |
| 86 | + |
| 87 | +.. note:: |
| 88 | + |
| 89 | + This example uses the ``sample_restaurants.restaurants`` collection |
| 90 | + from the :atlas:`Atlas sample datasets </sample-data>`. To learn how to create a |
| 91 | + free MongoDB Atlas cluster and load the sample datasets, see :ref:`<pymongo-get-started>`. |
| 92 | + |
| 93 | +To perform an aggregation, pass a list of aggregation stages to the |
| 94 | +``collection.aggregate()`` method. |
| 95 | + |
| 96 | +The following code example produces a count of the number of bakeries in each borough |
| 97 | +of New York. To do so, it uses an aggregation pipeline with the following stages: |
| 98 | + |
| 99 | +- A :manual:`$match </reference/operator/aggregation/match/>` stage to filter for documents |
| 100 | + whose ``cuisine`` field contains the value ``"Bakery"``. |
| 101 | + |
| 102 | +- A :manual:`$group </reference/operator/aggregation/group/>` stage to group the matching |
| 103 | + documents by the ``borough`` field, accumulating a count of documents for each distinct |
| 104 | + value. |
13 | 105 |
|
14 | 106 | .. code-block:: python
|
| 107 | + :copyable: true |
| 108 | + |
| 109 | + # Define an aggregation pipeline with a match stage and a group stage |
| 110 | + pipeline = [ |
| 111 | + { "$match": { "cuisine": "Bakery" } }, |
| 112 | + { "$group": { "_id": "$borough", "count": { "$sum": 1 } } } |
| 113 | + ] |
| 114 | + |
| 115 | + # Execute the aggregation |
| 116 | + aggCursor = collection.aggregate(pipeline) |
| 117 | + |
| 118 | + # Print the aggregated results |
| 119 | + for document in aggCursor: |
| 120 | + print(document) |
| 121 | + |
| 122 | +The preceding code example produces output similar to the following: |
| 123 | + |
| 124 | +.. code-block:: javascript |
| 125 | + |
| 126 | + {'_id': 'Bronx', 'count': 71} |
| 127 | + {'_id': 'Brooklyn', 'count': 173} |
| 128 | + {'_id': 'Missing', 'count': 2} |
| 129 | + {'_id': 'Manhattan', 'count': 221} |
| 130 | + {'_id': 'Queens', 'count': 204} |
| 131 | + {'_id': 'Staten Island', 'count': 20} |
| 132 | + |
| 133 | +Explain an Aggregation |
| 134 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 135 | + |
| 136 | +To view information about how MongoDB executes your operation, you can |
| 137 | +instruct MongoDB to **explain** it. When MongoDB explains an operation, it returns |
| 138 | +**execution plans** and performance statistics. An execution |
| 139 | +plan is a potential way MongoDB can complete an operation. |
| 140 | +When you instruct MongoDB to explain an operation, it returns both the |
| 141 | +plan MongoDB executed and any rejected execution plans. |
| 142 | + |
| 143 | +To explain an aggregation operation, you can use either the |
| 144 | +`PyMongoExplain <https://pypi.org/project/pymongoexplain/>`__ library or a database |
| 145 | +command. Select the corresponding tab below to see an example of each method. |
| 146 | + |
| 147 | +.. tabs:: |
| 148 | + |
| 149 | + .. tab:: PyMongoExplain |
| 150 | + :tabid: pymongoexplain |
| 151 | + |
| 152 | + Use pip to install the ``pymongoexplain`` library, as shown in the |
| 153 | + following example: |
| 154 | + |
| 155 | + .. code-block:: sh |
| 156 | + |
| 157 | + python3 -m pip install pymongoexplain |
| 158 | + |
| 159 | + The following code example runs the preceding aggregation example and prints the explanation |
| 160 | + returned by MongoDB: |
| 161 | + |
| 162 | + .. io-code-block:: |
| 163 | + :copyable: true |
| 164 | + |
| 165 | + .. input:: |
| 166 | + :language: python |
| 167 | + |
| 168 | + # Define an aggregation pipeline with a match stage and a group stage |
| 169 | + pipeline = [ |
| 170 | + { "$match": { "cuisine": "Bakery" } }, |
| 171 | + { "$group": { "_id": "$borough", "count": { "$sum": 1 } } } |
| 172 | + ] |
| 173 | + |
| 174 | + # Execute the operation and print the explanation |
| 175 | + result = ExplainableCollection(collection).aggregate(pipeline) |
| 176 | + print(result) |
| 177 | + |
| 178 | + .. output:: |
| 179 | + :language: javascript |
| 180 | + :visible: false |
| 181 | + |
| 182 | + ... |
| 183 | + 'winningPlan': {'queryPlan': {'stage': 'GROUP', |
| 184 | + 'planNodeId': 3, |
| 185 | + 'inputStage': {'stage': 'COLLSCAN', |
| 186 | + 'planNodeId': 1, |
| 187 | + 'filter': {'cuisine': {'$eq': 'Bakery'}}, |
| 188 | + 'direction': 'forward'}}, |
| 189 | + ... |
| 190 | + |
| 191 | + .. tab:: Database Command |
| 192 | + :tabid: db-command |
| 193 | + |
| 194 | + The following code example runs the preceding aggregation example and prints the explanation |
| 195 | + returned by MongoDB: |
| 196 | + |
| 197 | + .. io-code-block:: |
| 198 | + :copyable: true |
| 199 | + |
| 200 | + .. input:: |
| 201 | + :language: python |
| 202 | + |
| 203 | + # Define an aggregation pipeline with a match stage and a group stage |
| 204 | + pipeline = [ |
| 205 | + { $match: { cuisine: "Bakery" } }, |
| 206 | + { $group: { _id: "$borough", count: { $sum: 1 } } } |
| 207 | + ] |
| 208 | + |
| 209 | + # Execute the operation and print the explanation |
| 210 | + result = database.command("aggregate", "collection", pipeline=pipeline, explain=True) |
| 211 | + print(result) |
| 212 | + |
| 213 | + .. output:: |
| 214 | + :language: javascript |
| 215 | + |
| 216 | + ... |
| 217 | + 'command': {'aggregate': 'collection', |
| 218 | + 'pipeline': [{'$match': {'cuisine': 'Bakery'}}, |
| 219 | + {'$group': {'_id': '$borough', |
| 220 | + 'count': {'$sum': 1}}}], |
| 221 | + 'explain': True, |
| 222 | + ... |
| 223 | + |
| 224 | +.. tip:: |
| 225 | + |
| 226 | + You can use Python's ``pprint`` module to make explanation results easier to read: |
| 227 | + |
| 228 | + .. code-block:: python |
| 229 | + |
| 230 | + import pprint |
| 231 | + ... |
| 232 | + pprint.pp(result) |
| 233 | + |
| 234 | +Additional Information |
| 235 | +---------------------- |
| 236 | + |
| 237 | +MongoDB Server Manual |
| 238 | +~~~~~~~~~~~~~~~~~~~~~ |
| 239 | + |
| 240 | +To view a full list of expression operators, see :manual:`Aggregation |
| 241 | +Operators. </reference/operator/aggregation/>` |
| 242 | + |
| 243 | +To learn about assembling an aggregation pipeline and view examples, see |
| 244 | +:manual:`Aggregation Pipeline. </core/aggregation-pipeline/>` |
| 245 | + |
| 246 | +To learn more about creating pipeline stages, see :manual:`Aggregation |
| 247 | +Stages. </reference/operator/aggregation-pipeline/>` |
| 248 | + |
| 249 | +To learn more about explaining MongoDB operations, see |
| 250 | +:manual:`Explain Output </reference/explain-results/>` and |
| 251 | +:manual:`Query Plans. </core/query-plans/>` |
| 252 | + |
| 253 | +Aggregation Tutorials |
| 254 | +~~~~~~~~~~~~~~~~~~~~~ |
| 255 | + |
| 256 | +To view step-by-step explanations of common aggregation tasks, see |
| 257 | +:ref:`pymongo-aggregation-tutorials-landing`. |
| 258 | + |
| 259 | +API Documentation |
| 260 | +~~~~~~~~~~~~~~~~~ |
| 261 | + |
| 262 | +For more information about executing aggregation operations with {+driver-short+}, |
| 263 | +see the following API documentation: |
15 | 264 |
|
16 |
| - >>> db.command('aggregate', 'things', pipeline=pipeline, explain=True) |
17 |
| - {'ok': 1.0, 'stages': [...]} |
| 265 | +- `aggregate() <{+api-root+}pymongo/collection.html#pymongo.collection.Collection.aggregate>`__ |
0 commit comments