From 6bf0d7560888889962a994c58efdd64e11e43701 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Sat, 24 Mar 2012 16:27:37 -0400 Subject: [PATCH 01/17] Begin work on gaming: user state Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 160 ++++++++++++++++++ source/applications/use-cases/index.txt | 43 +++++ .../use-cases/use-case-template.txt | 58 +++++++ 3 files changed, 261 insertions(+) create mode 100644 source/applications/use-cases/gaming-user-state.txt create mode 100644 source/applications/use-cases/index.txt create mode 100644 source/applications/use-cases/use-case-template.txt diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt new file mode 100644 index 00000000000..8d695ca6bcf --- /dev/null +++ b/source/applications/use-cases/gaming-user-state.txt @@ -0,0 +1,160 @@ +================================= +Online Gaming: Storing User State +================================= + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principals for using +MongoDB as a persistent storage engine for user state data from an online +game, particularly one that contains role-playing characteristics. + +Problem +~~~~~~~ + +In designing an online game, there is a need to store various +data about the user's character. Some of the attributes might include: + +Character attributes + These might include intrinsic characteristics such as strength, + dexterity, charisma, etc., as well as variable characteristics such + as health, mana (if your game includes magic), etc. +Character inventory + If your game includes the ability for the user to carry around + objects, you will need to keep track of the items carried. +Character location / relationship to the game world + If your game allows the user to move their character from one + location to another, this information needs to be stored as well. + +In addition, you need to store all this data for large numbers of +users who might be playing the game simultaneously, and this data +needs to be both readable and writeable with minimal latency in order +to ensure responsiveness during gameplay. + +Another consideration when designing the persistence backend for an +online game is its flexibility. Particularly in early releases of a +game, you may wish to change gameplay mechanics significantly as you +receive feedback from your users. As you implement these changes, you +need to be able to migrate your persistent data from one format to +another with minimal (or no) downtime. + +Solution +~~~~~~~~ + +The solution presented by this case study assumes that the read and +write performance for the user state is equally important and must be +accessible with minimal latency. + +Schema Design +~~~~~~~~~~~~~ + +Ultimately, the particulars of your schema depends on the particular +design of your game. When designing your schema for the user state, +you should attempt to encapsulate all the commonly used data into the +user object in order to minimize the number of queries to the database +and the number of seeks in a query. If you can manage to +encapsulate all relevant user state into a single document, this +satisfies both these criteria. + +In a role-playing game, then, a typical user state document might look +like the following: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'Tim', + character: { + intrinsics: { + strength: 10, + dexterity: 16, + intelligence: 17, + charisma: 8 }, + class: 'mage', + health: 212, + mana: 152 + }, + location: { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + contents: [ + { qty:1, id:ObjectId('...'), name:'grue' }, + { qty:1, id:ObjectId('...'), name:'Tim' }, + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] + }, + armor: [ + { id:ObjectId('...'), region:'head'}, + { id:ObjectId('...'), region:'body'}, + { id:ObjectId('...'), region:'hands'}, + { id:ObjectId('...'), region:'feet'}], + weapons: [ {id:ObjectId('...'), hand:'both'} ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'backpack', contents: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + { qty:1, id:ObjectId('...'), name:"wizard's hat", bonus:3}, + { qty:1, id:ObjectId('...'), name:"wizard's robe", bonus:0}, + { qty:1, id:ObjectId('...'), name:"old boots", bonus:0}, + { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2}, + { qty:523, id:ObjectId('...'), name:"gold" } ] + } + +There are a few things to note about this document. First, information +about the character's location in the game is encapsulated under the +``location`` attribute. Note in particular that all of the information +necessary to render the room is encapsulated within the user's state +document. This allows the game system to render the room without +making a second query to the database to get room information. + +Second, notice that the ``armor`` and ``weapons`` attributes contain +little information about the actual items being worn or carried. This +information is actually stored under the ``inventory`` property. Since +the inventory information is stored in the same document, there is no +need to replicate the detailed information about each item into the +``armor`` and ``weapons`` properties. + +Finally, note that ``inventory`` contains the item details necessary +for rendering each item in the character's posession, including any +enchantments (``bonus``) and ``quantity``. Once again, embedding this data +into the character record means you don't have to perform a separate +query to fetch item details necessary for display. + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +Eventually your system's events will exceed the capacity of a single +event logging database instance. In these situations you will want to +use a :term:`shard cluster`, which takes advantage of MongoDB's +:term:`sharding` functionality. This section introduces the unique +sharding concerns for this use case. + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt new file mode 100644 index 00000000000..5aa6023e118 --- /dev/null +++ b/source/applications/use-cases/index.txt @@ -0,0 +1,43 @@ +:orphan: + +========= +Use Cases +========= + + +Real time Analytics +------------------- + +.. toctree:: + :maxdepth: 2 + + real-time-analytics-storing-log-data + real-time-analytics-pre-aggregated-reports + real-time-analytics-hierarchical-aggregation + +E-Commerce +---------- + +.. toctree:: + :maxdepth: 2 + + ecommerce-product-catalog + ecommerce-inventory-management + ecommerce-category-hierarchy + +Content Management Systems +-------------------------- + +.. toctree:: + :maxdepth: 2 + + cms-metadata-and-asset-management + cms-storing-comments + +Online Gaming +------------- + +.. toctree:: + :maxdepth: 2 + + gaming-user-state diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt new file mode 100644 index 00000000000..9a8dcef8b44 --- /dev/null +++ b/source/applications/use-cases/use-case-template.txt @@ -0,0 +1,58 @@ +:orphan: + +==================== +TODO: Section: Title +==================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principals for using +MongoDB as a persistent storage engine for TODO: what are we building? + +Problem +~~~~~~~ + +TODO: describe problem + +Solution +~~~~~~~~ + +TODO: describe assumptions, overview of solution + +Schema Design +~~~~~~~~~~~~~ + +TODO: document collections, doc schemas + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. From c34beed2f44cb440ad0bf2ecbd931c3bfc29a475 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 15:23:34 -0400 Subject: [PATCH 02/17] First draft complete on gaming: user state Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 317 +++++++++++++++++- 1 file changed, 302 insertions(+), 15 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 8d695ca6bcf..6f71a5808e5 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -80,15 +80,17 @@ like the following: id: 'maze-1', description: 'a maze of twisty little passages...', exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } + ], contents: [ - { qty:1, id:ObjectId('...'), name:'grue' }, - { qty:1, id:ObjectId('...'), name:'Tim' }, { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] }, + gold: 523, armor: [ { id:ObjectId('...'), region:'head'}, { id:ObjectId('...'), region:'body'}, - { id:ObjectId('...'), region:'hands'}, { id:ObjectId('...'), region:'feet'}], weapons: [ {id:ObjectId('...'), hand:'both'} ], inventory: [ @@ -99,8 +101,7 @@ like the following: { qty:1, id:ObjectId('...'), name:"wizard's hat", bonus:3}, { qty:1, id:ObjectId('...'), name:"wizard's robe", bonus:0}, { qty:1, id:ObjectId('...'), name:"old boots", bonus:0}, - { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2}, - { qty:523, id:ObjectId('...'), name:"gold" } ] + { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2} ] } There are a few things to note about this document. First, information @@ -126,35 +127,321 @@ query to fetch item details necessary for display. Operations ---------- -TODO: summary of the operations section +In an online gaming system with the character state stored in a single document, +the primary operations you'll be performing are querying for the character state +document by ``_id``, extracting relevant data for display, and updating various +attributes about the character. This section describes procedures for performing +these queries, extractions, and updates. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you can implement this system using any language you choose. -Operation 1 -~~~~~~~~~~~ +Load Character Data from MongoDB +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TODO: describe what the operation is (optional) +The most basic operation in this system is loading the character state. Query ````` -TODO: describe query +Use the following query to load the user document from MongoDB: + +.. code-block:: pycon + + >>> character = db.characters.find_one({'_id': user_id}) Index Support ````````````` -TODO: describe indexes to optimize this query +In this case, the default index that MongoDB supplies on the ``_id`` field is +sufficient for good performance of this query. + +Extract Armor and Weapon Data for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to save space, the schema described above stores item details only in +the ``inventory`` attribute, storing ``ObjectId``\ s in other locations. To display +these item details, as on a character summary window, you need to merge the +information from the ``armor`` and ``weapons`` attributes with information from +the ``inventory`` attribute. + +Suppose, for instance, that your code is displaying the armor data using the +following Jinja2 template: + +.. code-block:: html + +
+

Armor

+
+ {% if value.head %} +
Helmet
+
{{value.head[0].description}}
+ {% endif %} + {% if value.hands %} +
Gloves
+
{{value.hands[0].description}}
+ {% endif %} + {% if value.feet %} +
Boots
+
{{value.feet[0].description}}
+ {% endif %} + {% if value.body %} +
Body Armor
+
    {% for piece in value.body %} +
  • piece.description
  • + {% endfor %}
+ {% endif %} +
+ + +In this case, you want the various ``description`` fields above to be text +similar to "+3 wizard's hat." The context passed to the template above, then, +would be of the following form: + +.. code-block:: python + + { + "head": [ { "id":..., "description": "+3 wizard's hat" } ], + "hands": [], + "feet": [ { "id":..., "description": "old boots" } ], + "body": [ { "id":..., "description": "wizard's robe" } ], + } + +In order to build up this structure, use the following helper functions: + +.. code-block:: python + + def get_item_index(inventory): + '''Given an inventory attribute, recursively build up an item + index (including all items contained within other items) + ''' + + result = {} + for item in inventory: + result[item['_id']] = item + if 'contents' in item: + result.update(get_item_index(item['contents'])) + return result + + def describe_item(item): + result = dict(item) + if item['bonus']: + description = '%+d %s' % (item['bonus'], item['name']) + else: + description = item['name'] + result['description'] = description + return result + + def get_armor_for_display(character, item_index): + '''Given a character document, return an 'armor' value + suitable for display''' + + result = dict(head=[], hands=[], feet=[], body=[]) + for piece in character['armor']: + item = describe_item(item_index[piece['id']]) + result[piece['region']].append(item) + return result + +In order to actually display the armor, then, you would use the following code: + +.. code-block:: pycon + + >>> item_index = get_item_index( + ... character['inventory'] + character['location']['contents']) + >>> armor = get_armor_for_dislay(character, item_index) + +Note in particular that you are building an index not only for the items the +character is actually carrying in inventory, but also for the items that the user +might interact with in the room. + +Similarly, in order to display the weapon information, you need to build a +structure such as the following: + +.. code-block:: python + + { + "left": None, + "right": None, + "both": { "description": "+2 quarterstaff" } + } + +The helper function is similar to that for ``get_armor_for_display``: + +.. code-block:: python + + def get_weapons_for_display(character, item_index): + '''Given a character document, return a 'weapons' value + suitable for display''' + + result = dict(left=None, right=None, both=None) + for piece in character['weapons']: + item = describe_item(item_index[piece['id']]) + result[piece['hand']] = item + return result + +In order to actually display the weapons, then, you would use the following code: + +.. code-block:: pycon + + >>> armor = get_weapons_for_dislay(character, item_index) + +Extract Character Attributes, Inventory, and Room Information for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to display information about the character's attributes, inventory, and +surroundings, you also need to extract fields from the character state. In this +case, however, the schema defined above keeps all the relevant information for +display embedded in those sections of the document. The code for extracting this +data, then, is the following: + +.. code-block:: pycon + + >>> attributes = character['character'] + >>> inventory = character['inventory'] + >>> room_data = character['location'] + +Update Character Inventory +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the user decides to pick up an item from the room and add +it to their inventory. In this case, you need to update both the character state +and the global location state: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location.contents': { '_id': item['id'] } } }) + db.location.update( + { '_id': character['location']['id'] }, + { '$pull': { 'contents': { 'id': item_id } } }) + +While the above code may be for a single-player game, if you allow multiple +players, or non-player characters, to pick up items, that introduces a problem in +the above code where two characters may try to pick up an item simultaneously. To +guard against that, use the ``location`` collection to decide between ties. In +this case, the code is now the following: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + result = db.location.update( + { '_id': character['location']['id'], + 'contents.id': item_id }, + { '$pull': { 'contents': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location': { '_id': item['id'] } } }) + +By ensuring that the item is present before removing it from the room in the +``update`` call above, you guarantee that only one player/non-player +character/monster can pick up the item. + +Move the Character to a Different Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the user decides to move north. In this case, you need to +update the character state to match the new location: + +.. code-block:: python + + def move(character, direction): + '''Move the character to a new location''' + + # Remove character from current location + db.location.update( + {'_id': character['location']['id'] }, + {'$pull': {'players': {'id': character['_id'] } } }) + # Add character to new location, retrieve new location data + new_location = db.location.find_and_modify( + { '_id': character['location']['exits'][direction] }, + { '$push': { 'players': { + 'id': character['_id'], + 'name': character['name'] } } }, + new=True) + character['location'] = new_location + db.character.update( + { '_id': character['_id'] }, + { '$set': { 'location': new_location } }) + +Here, note that the code updates the old room, the new room, and the character +document. + +Buy an Item +~~~~~~~~~~~ + +If your character wants to buy an item, you need to add that item to the +character's inventory, decrement the character's gold, increment the shopkeeper's +gold, and update the room: + +.. code-block:: python + + def buy(character, shopkeeper, item_id, price): + '''Pick up an item, add to the character's inventory, and transfer + payment to the shopkeeper + ''' + + result = db.character.update( + { '_id': character['_id'], + 'gold': { '$gte': price } }, + { '$inc': { 'gold': -price } }, + safe=True ) + if not result['updatedExisting']: + raise InsufficientFunds() + try: + pick_up_item(character, item_id) + except: + # Add the gold back to the character + result = db.character.update( + { '_id': character['_id'] }, + { '$inc': { 'gold': price } } ) + raise + character['gold'] -= price + db.character.update( + { '_id': shopkeeper['_id'] }, + { '$inc': { 'gold': price } } ) + +Note that the code above ensures that the character has sufficent gold to pay for +the item using the ``updatedExisting`` trick used for picking up items. The race +condition for item pickup is handled as well, "rolling back" the removal of gold +from the character's wallet if the item cannot be picked up. Sharding -------- -Eventually your system's events will exceed the capacity of a single -event logging database instance. In these situations you will want to +If your system needs to scale beyond a single MongoDB node, you will want to use a :term:`shard cluster`, which takes advantage of MongoDB's -:term:`sharding` functionality. This section introduces the unique -sharding concerns for this use case. +:term:`sharding` functionality. .. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki page. + +Sharding in this use case is fairly +straightforward, since all our items are always retrieved by ``_id``. To shard +the ``character`` and ``location`` collections, the commands would be the +following: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'character') + { "collectionsharded" : "character", "ok" : 1 } + >>> db.command('shardcollection', 'location')) + { "collectionsharded" : "location", "ok" : 1 } + +Note that there is no need here to specify a :term:`shard key` since MongoDB +shards on ``_id`` by default. From 0f64f2bf83ce7b1211b800db8fdc765ce482262c Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 15:35:59 -0400 Subject: [PATCH 03/17] Add ordered list for itemized explanation Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 36 +++++++++---------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 6f71a5808e5..5f1d8cc9901 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -104,25 +104,23 @@ like the following: { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2} ] } -There are a few things to note about this document. First, information -about the character's location in the game is encapsulated under the -``location`` attribute. Note in particular that all of the information -necessary to render the room is encapsulated within the user's state -document. This allows the game system to render the room without -making a second query to the database to get room information. - -Second, notice that the ``armor`` and ``weapons`` attributes contain -little information about the actual items being worn or carried. This -information is actually stored under the ``inventory`` property. Since -the inventory information is stored in the same document, there is no -need to replicate the detailed information about each item into the -``armor`` and ``weapons`` properties. - -Finally, note that ``inventory`` contains the item details necessary -for rendering each item in the character's posession, including any -enchantments (``bonus``) and ``quantity``. Once again, embedding this data -into the character record means you don't have to perform a separate -query to fetch item details necessary for display. +There are a few things to note about this document: + +#. Information about the character's location in the game is encapsulated under + the ``location`` attribute. Note in particular that all of the information + necessary to render the room is encapsulated within the user's state + document. This allows the game system to render the room without making a + second #query to the database to get room information. +#. The ``armor`` and ``weapons`` attributes contain little information about the + actual items being worn or carried. This information is actually stored under + the ``inventory`` property. Since the inventory information is stored in the + same document, there is no need to replicate the detailed information about + each item into the ``armor`` and ``weapons`` properties. +#. Finally, note that ``inventory`` contains the item details necessary for + rendering each item in the character's posession, including anyenchantments + (``bonus``) and ``quantity``. Once again, embedding this data into the + character record means you don't have to perform a separate query to fetch + item details necessary for display. Operations ---------- From 12cf2bf7a739771b254b3de3fcd78929545d8bc0 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 16:42:02 -0400 Subject: [PATCH 04/17] Fix sharding commands, fix output of shardcollection to be inline with the interactive python session Signed-off-by: Rick Copeland --- .../cms-metadata-and-asset-management.txt | 454 ++++++++++++++++++ .../ecommerce-category-hierarchy.txt | 247 ++++++++++ .../ecommerce-inventory-management.txt | 408 ++++++++++++++++ .../use-cases/gaming-user-state.txt | 8 +- source/use-cases/hierarchical-aggregation.txt | 24 +- source/use-cases/pre-aggregated-reports.txt | 14 +- source/use-cases/product-catalog.txt | 5 - source/use-cases/storing-comments.txt | 5 +- 8 files changed, 1127 insertions(+), 38 deletions(-) create mode 100644 source/applications/use-cases/cms-metadata-and-asset-management.txt create mode 100644 source/applications/use-cases/ecommerce-category-hierarchy.txt create mode 100644 source/applications/use-cases/ecommerce-inventory-management.txt diff --git a/source/applications/use-cases/cms-metadata-and-asset-management.txt b/source/applications/use-cases/cms-metadata-and-asset-management.txt new file mode 100644 index 00000000000..c2693f8e7be --- /dev/null +++ b/source/applications/use-cases/cms-metadata-and-asset-management.txt @@ -0,0 +1,454 @@ +================================== +CMS: Metadata and Asset Management +================================== + +Problem +======= + +You are designing a content management system (CMS) and you want to use +MongoDB to store the content of your sites. + +Solution Overview +================= + +The approach in this solution is inspired by the design of Drupal, an +open source CMS written in PHP on relational databases that is available +at `http://www.drupal.org `_. In this case, you +will take advantage of MongoDB's dynamically typed collections to +*polymorphically* store all your content nodes in the same collection. +Navigational information will be stored in its own collection since +it has relatively little in common with the content nodes, and is not covered in +this use case. + +The main node types which are covered here are: + +Basic page + Basic pages are useful for displaying + infrequently-changing text such as an 'about' page. With a basic + page, the salient information is the title and the + content. +Blog entry + Blog entries record a "stream" of posts from users + on the CMS and store title, author, content, and date as relevant + information. +Photo + Photos participate in photo galleries, and store title, + description, author, and date along with the actual photo binary + data. + +Schema Design +============= + +The node collection contains documents of various formats, but they +will all share a similar structure, with each document including an +``_id``, ``type``, ``section``, ``slug``, ``title``, ``created`` date, +``author``, and ``tags``. The +``section`` property is used to identify groupings of items (grouped to a +particular blog or photo gallery, for instance). The ``slug`` property is +a url-friendly representation of the node that is unique within its +section, and is used for mapping URLs to nodes. Each document also +contains a ``detail`` field which will vary per document type: + +.. code-block:: javascript + + { + _id: ObjectId(…), + nonce: ObjectId(…), + metadata: { + type: 'basic-page' + section: 'my-photos', + slug: 'about', + title: 'About Us', + created: ISODate(...), + author: { _id: ObjectId(…), name: 'Rick' }, + tags: [ ... ], + detail: { text: '# About Us\n…' } + } + } + +For the basic page above, the detail field might simply contain the text +of the page. In the case of a blog entry, the document might resemble +the following instead: + +.. code-block:: javascript + + { + … + metadata: { + … + type: 'blog-entry', + section: 'my-blog', + slug: '2012-03-noticed-the-news', + … + detail: { + publish_on: ISODate(…), + text: 'I noticed the news from Washington today…' + } + } + } + +Photos present something of a special case. Since you'll need to store +potentially very large photos, it's nice to be able to separate the binary +storage of photo data from the metadata storage. GridFS provides just such a +mechanism, splitting a "filesystem" of potentially very large "files" into +two collections, the ``files`` collection and the ``chunks`` collection. In +this case, the two collections will be called ``cms.assets.files`` and +``cms.assets.chunks``. Documents in the ``cms.assets.files`` +collection will be used to store the normal GridFS metadata as well as CMS node +metadata: + +.. code-block:: javascript + + { + _id: ObjectId(…), + length: 123..., + chunkSize: 262144, + uploadDate: ISODate(…), + contentType: 'image/jpeg', + md5: 'ba49a...', + metadata: { + nonce: ObjectId(…), + slug: '2012-03-invisible-bicycle', + type: 'photo', + section: 'my-album', + title: 'Kitteh', + created: ISODate(…), + author: { _id: ObjectId(…), name: 'Jared' }, + tags: [ … ], + detail: { + filename: 'kitteh_invisible_bike.jpg', + resolution: [ 1600, 1600 ], … } + } + } + +NOte that the "normal" node schema is embedded here in the photo schema, allowing +the use of the same code to manipulate nodes of all types. + +Operations +========== + +Here, some common queries and updates that you might need for your CMS are +described, paying particular attention to any "tweaks" necessary for the various +node types. The examples use the Python +programming language and the ``pymongo`` MongoDB driver, but implementations +would be similar in other languages as well. + +Create and Edit Content Nodes +----------------------------- + +The content producers using your CMS will be creating and editing content +most of the time. Most content-creation activities are relatively +straightforward: + +.. code-block:: python + + db.cms.nodes.insert({ + 'nonce': ObjectId(), + 'metadata': { + 'section': 'myblog', + 'slug': '2012-03-noticed-the-news', + 'type': 'blog-entry', + 'title': 'Noticed in the News', + 'created': datetime.utcnow(), + 'author': { 'id': user_id, 'name': 'Rick' }, + 'tags': [ 'news', 'musings' ], + 'detail': { + 'publish_on': datetime.utcnow(), + 'text': 'I noticed the news from Washington today…' } + } + }) + +Once the node is in the database, there is a potential problem with +multiple editors. In order to support this, the schema uses the special ``nonce`` +value to detect when another editor may have modified the document and +allow the application to resolve any conflicts: + +.. code-block:: python + + def update_text(section, slug, nonce, text): + result = db.cms.nodes.update( + { 'metadata.section': section, + 'metadata.slug': slug, + 'nonce': nonce }, + { '$set':{'metadata.detail.text': text, 'nonce': ObjectId() } }, + safe=True) + if not result['updatedExisting']: + raise ConflictError() + +You might also want to perform metadata edits to the item such as adding +tags: + +.. code-block:: python + + db.cms.nodes.update( + { 'metadata.section': section, 'metadata.slug': slug }, + { '$addToSet': { 'tags': { '$each': [ 'interesting', 'funny' ] } } }) + +In this case, you don't actually need to supply the nonce (nor update it) +since you're using the atomic ``$addToSet`` modifier in MongoDB. + +Index Support +~~~~~~~~~~~~~ + +Updates in this case are based on equality queries containing the +(``section``, ``slug``, and ``nonce``) values. To support these queries, you +*might* use the following index: + +.. code-block:: python + + >>> db.cms.nodes.ensure_index([ + ... ('metadata.section', 1), ('metadata.slug', 1), ('nonce', 1) ]) + +Also note, however, that you'd like to ensure that two editors don't +create two documents with the same section and slug. To support this, you need a +second index with a unique constraint: + +.. code-block:: python + + >>> db.cms.nodes.ensure_index([ + ... ('metadata.section', 1), ('metadata.slug', 1)], unique=True) + +In fact, since the expectation is that most of the time (``section``, ``slug``, +``nonce``) is going to be unique, you don't actually get much benefit from the +first index and can use only the second one to satisfy the update queries as +well. + +Upload a Photo +-------------- + +Uploading photos shares some things in common with node +update, but it also has some extra nuances: + +.. code-block:: python + + def upload_new_photo( + input_file, section, slug, title, author, tags, details): + fs = GridFS(db, 'cms.assets') + with fs.new_file( + content_type='image/jpeg', + metadata=dict( + type='photo', + locked=datetime.utcnow(), + section=section, + slug=slug, + title=title, + created=datetime.utcnow(), + author=author, + tags=tags, + detail=detail)) as upload_file: + while True: + chunk = input_file.read(upload_file.chunk_size) + if not chunk: break + upload_file.write(chunk) + # unlock the file + db.assets.files.update( + {'_id': upload_file._id}, + {'$set': { 'locked': None } } ) + +Here, since uploading the photo is a non-atomic operation, you need to +"lock" the file during upload by writing the current datetime into the +record. This lets the application detect when a file upload may be stalled, which +is helpful when working with multiple editors. This solution assumes that, for +photo upload, the last update wins: + +.. code-block:: python + + def update_photo_content(input_file, section, slug): + fs = GridFS(db, 'cms.assets') + + # Delete the old version if it's unlocked or was locked more than 5 + # minutes ago + file_obj = db.cms.assets.find_one( + { 'metadata.section': section, + 'metadata.slug': slug, + 'metadata.locked': None }) + if file_obj is None: + threshold = datetime.utcnow() - timedelta(seconds=300) + file_obj = db.cms.assets.find_one( + { 'metadata.section': section, + 'metadata.slug': slug, + 'metadata.locked': { '$lt': threshold } }) + if file_obj is None: raise FileDoesNotExist() + fs.delete(file_obj['_id']) + + # update content, keep metadata unchanged + file_obj['locked'] = datetime.utcnow() + with fs.new_file(**file_obj): + while True: + chunk = input_file.read(upload_file.chunk_size) + if not chunk: break + upload_file.write(chunk) + # unlock the file + db.assets.files.update( + {'_id': upload_file._id}, + {'$set': { 'locked': None } } ) + +You can, of course, perform metadata edits to the item such as adding +tags without the extra complexity: + +.. code-block:: python + + db.cms.assets.files.update( + { 'metadata.section': section, 'metadata.slug': slug }, + { '$addToSet': { + 'metadata.tags': { '$each': [ 'interesting', 'funny' ] } } }) + +Index Support +~~~~~~~~~~~~~ + +Updates here are also based on equality queries containing the +(``section``, ``slug``) values, so you can use the same types of indexes as were +used in the "regular" node case. Note in particular that you need a +unique constraint on (``section``, ``slug``) to ensure that one of the calls to +``GridFS.new_file()`` will fail if multiple editors try to create or update +the file simultaneously. + +.. code-block:: python + + >>> db.cms.assets.files.ensure_index([ + ... ('metadata.section', 1), ('metadata.slug', 1)], unique=True) + +Locate and Render a Node +------------------------ + +You need to be able to locate a node based on its section and slug, which +have been extracted from the page definition and URL by some +other technology. + +.. code-block:: python + + node = db.nodes.find_one( + {'metadata.section': section, 'metadata.slug': slug }) + +Index Support +~~~~~~~~~~~~~ + +The same indexes defined above on (``section``, ``slug``) would +efficiently render this node. + +Locate and Render a Photo +------------------------- + +You want to locate an image based on its section and slug, +which have been extracted from the page definition and URL +just as with other nodes. + +.. code-block:: python + + fs = GridFS(db, 'cms.assets') + with fs.get_version( + **{'metadata.section': section, 'metadata.slug': slug }) as img_fp: + # do something with the image file + +Index Support +~~~~~~~~~~~~~ + +The same indexes defined above on (``section``, ``slug``) would also +efficiently render this image. + +Search for Nodes by Tag +----------------------- + +You'd like to retrieve a list of nodes based on their tags: + +.. code-block:: python + + nodes = db.nodes.find({'metadata.tags': tag }) + +Index Support +~~~~~~~~~~~~~ + +To support searching efficiently, you should define indexes on any fields +you intend on using in your query: + +.. code-block:: python + + >>> db.cms.nodes.ensure_index('tags') + +Search for Images by Tag +------------------------ + +Here, you'd like to retrieve a list of images based on their tags: + +.. code-block:: python + + image_file_objects = db.cms.assets.files.find({'metadata.tags': tag }) + fs = GridFS(db, 'cms.assets') + for image_file_object in db.cms.assets.files.find( + {'metadata.tags': tag }): + image_file = fs.get(image_file_object['_id']) + # do something with the image file + +Index Support +~~~~~~~~~~~~~ + +As above, in order to support searching efficiently, you should define +indexes on any fields you expect to use in the query: + +.. code-block:: python + + >>> db.cms.assets.files.ensure_index('tags') + +Generate a Feed of Recently Published Blog Articles +--------------------------------------------------- + +Here, you need to generate an .rss or .atom feed for your recently +published blog articles, sorted by date descending: + +.. code-block:: python + + articles = db.nodes.find({ + 'metadata.section': 'my-blog' + 'metadata.published': { '$lt': datetime.utcnow() } }) + articles = articles.sort({'metadata.published': -1}) + +Index Support +~~~~~~~~~~~~~ + +In order to support this operation, you'll need to create an index on (``section``, +``published``) so the items are 'in order' for the query. Note that in cases +where you're sorting or using range queries, as here, the field on which +you're sorting or performing a range query must be the final field in the +index: + +.. code-block:: python + + >>> db.cms.nodes.ensure_index( + ... [ ('metadata.section', 1), ('metadata.published', -1) ]) + +Sharding +======== + +In a CMS system, read performance is generally much more important +than write performance. As such, you'll want to optimize the sharding setup +for read performance. In order to achieve the best read performance, you +need to ensure that queries are *routeable* by the mongos process. A +second consideration when sharding is that unique indexes do not span +shards. As such, the shard key must include the unique indexes in order to get +the same semantics as described above. Given +these constraints, sharding the nodes and assets on (``section``, ``slug``) +is a reasonable approach: + +.. code-block:: python + + >>> db.command('shardcollection', 'cms.nodes', { + ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) + { "collectionsharded" : "cms.nodes", "ok" : 1 } + >>> db.command('shardcollection', 'cms.assets.files', { + ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) + { "collectionsharded" : "cms.assets.files", "ok" : 1 } + +If you wish to shard the ``cms.assets.chunks`` collection, you need to shard +on the ``_id`` field (none of the node metadata is available on the +``cms.assets.chunks`` collection in GridFS:) + +.. code-block:: python + + >>> db.command('shardcollection', 'cms.assets.chunks', { + ... key: { '_id': 1 } }) + { "collectionsharded" : "cms.assets.chunks", "ok" : 1 } + +This actually still maintains the query-routability constraint, since +all reads from GridFS must first look up the document in ``cms.assets.files`` and +then look up the chunks separately (though the GridFS API sometimes +hides this detail.) diff --git a/source/applications/use-cases/ecommerce-category-hierarchy.txt b/source/applications/use-cases/ecommerce-category-hierarchy.txt new file mode 100644 index 00000000000..38c49aa5e8f --- /dev/null +++ b/source/applications/use-cases/ecommerce-category-hierarchy.txt @@ -0,0 +1,247 @@ +============================== +E-Commerce: Category Hierarchy +============================== + +Problem +======= + +You have a product hierarchy for an e-commerce site that you want to +query frequently and update somewhat frequently. + +Solution Overview +================= + +This solution keeps each category in its own document, along with a list of its +ancestors. The category hierarchy used in this example will be +based on different categories of music: + +.. figure:: img/ecommerce-category1.png + :align: center + :alt: Initial category hierarchy + + Initial category hierarchy + +Since categories change relatively infrequently, the focus here will be on the +operations needed to keep the hierarchy up-to-date and less on the performance +aspects of updating the hierarchy. + +Schema Design +============= + +Each category in the hierarchy will be represented by a document. That +document will be identified by an ``ObjectId`` for internal +cross-referencing as well as a human-readable name and a url-friendly +``slug`` property. Additionally, the schema stores an ancestors list along +with each document to facilitate displaying a category along with all +its ancestors in a single query. + +.. code-block:: javascript + + { "_id" : ObjectId("4f5ec858eb03303a11000002"), + "name" : "Modal Jazz", + "parent" : ObjectId("4f5ec858eb03303a11000001"), + "slug" : "modal-jazz", + "ancestors" : [ + { "_id" : ObjectId("4f5ec858eb03303a11000001"), + "slug" : "bop", + "name" : "Bop" }, + { "_id" : ObjectId("4f5ec858eb03303a11000000"), + "slug" : "ragtime", + "name" : "Ragtime" } ] + } + +Operations +========== + +Here, the various category manipulations you may need in an ecommerce site are +described as they would occur using the schema above. The examples use the Python +programming language and the ``pymongo`` MongoDB driver, but implementations +would be similar in other languages as well. + +Read and Display a Category +--------------------------- + +The simplest operation is reading and displaying a hierarchy. In this +case, you might want to display a category along with a list of "bread +crumbs" leading back up the hierarchy. In an E-commerce site, you'll +most likely have the slug of the category available for your query, as it can be +parsed from the URL. + +.. code-block:: python + + category = db.categories.find( + {'slug':slug}, + {'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 }) + +Here, the slug is used to retrieve the category, fetching only those +fields needed for display. + +Index Support +~~~~~~~~~~~~~ + +In order to support this common operation efficiently, you'll need an index +on the 'slug' field. Since slug is also intended to be unique, the index over it +should be unique as well: + +.. code-block:: python + + db.categories.ensure_index('slug', unique=True) + +Add a Category to the Hierarchy +------------------------------- + +Adding a category to a hierarchy is relatively simple. Suppose you wish +to add a new category 'Swing' as a child of 'Ragtime': + +.. figure:: img/ecommerce-category2.png + :align: center + :alt: Adding a category + + Adding a category + +In this case, the initial insert is simple enough, but after this +insert, the "Swing" category is still missing its ancestors array. To define +this, you'll need a helper function to build the ancestor list: + +.. code-block:: python + + def build_ancestors(_id, parent_id): + parent = db.categories.find_one( + {'_id': parent_id}, + {'name': 1, 'slug': 1, 'ancestors':1}) + parent_ancestors = parent.pop('ancestors') + ancestors = [ parent ] + parent_ancestors + db.categories.update( + {'_id': _id}, + {'$set': { 'ancestors': ancestors } }) + +Note that you only need to travel one level in the hierarchy to get the +ragtime's ancestors and build swing's entire ancestor list. Now you can +actually perform the insert and rebuild the ancestor list: + +.. code-block:: python + + doc = dict(name='Swing', slug='swing', parent=ragtime_id) + swing_id = db.categories.insert(doc) + build_ancestors(swing_id, ragtime_id) + +Index Support +~~~~~~~~~~~~~ + +Since these queries and updates all selected based on ``_id``, you only need +the default MongoDB-supplied index on ``_id`` to support this operation +efficiently. + +Change the Ancestry of a Category +--------------------------------- + +Suppose you wish to reorganize the hierarchy by moving 'bop' under +'swing': + +.. figure:: img/ecommerce-category3.png + :align: center + :alt: Change the parent of a category + + Change the parent of a category + +The initial update is straightforward: + +.. code-block:: python + + db.categories.update( + {'_id':bop_id}, {'$set': { 'parent': swing_id } } ) + +Now, you still need to update the ancestor list for bop and all its +descendants. In this case, you can't guarantee that the ancestor list of +the parent category is always correct, since MongoDB may +process the categories out-of-order. To handle this, you'll need a new +ancestor-building function: + +.. code-block:: python + + def build_ancestors_full(_id, parent_id): + ancestors = [] + while parent_id is not None: + parent = db.categories.find_one( + {'_id': parent_id}, + {'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1}) + parent_id = parent.pop('parent') + ancestors.append(parent) + db.categories.update( + {'_id': _id}, + {'$set': { 'ancestors': ancestors } }) + +Now, at the expense of a few more queries up the hierarchy, you can +easily reconstruct all the descendants of 'bop': + +.. code-block:: python + + for cat in db.categories.find( + {'ancestors._id': bop_id}, + {'parent_id': 1}): + build_ancestors_full(cat['_id'], cat['parent_id']) + +Index Support +~~~~~~~~~~~~~ + +In this case, an index on ``ancestors._id`` would be helpful in +determining which descendants need to be updated: + +.. code-block:: python + + db.categories.ensure_index('ancestors._id') + +Rename a Category +----------------- + +Renaming a category would normally be an extremely quick operation, but +in this case due to denormalization, you also need to update the +descendants. Suppose you need to rename "Bop" to "BeBop:" + +.. figure:: img/ecommerce-category4.png + :align: center + :alt: Rename a category + + Rename a category + +First, you need to update the category name itself: + +.. code-block:: python + + db.categories.update( + {'_id':bop_id}, {'$set': { 'name': 'BeBop' } } ) + +Next, you need to update each descendant's ancestors list: + +.. code-block:: python + + db.categories.update( + {'ancestors._id': bop_id}, + {'$set': { 'ancestors.$.name': 'BeBop' } }, + multi=True) + +Here, you can use the positional operation ``$`` to match the exact "ancestor" +entry that matches the query, as well as the ``multi`` option on the +update to ensure the rename operation occurs in a single server +round-trip. + +Index Support +~~~~~~~~~~~~~ + +In this case, the index you have already defined on ``ancestors._id`` is +sufficient to ensure good performance. + +Sharding +======== + +In this solution, it is unlikely that you would want to shard the +collection since it's likely to be quite small. If you *should* decide to +shard, the use of an ``_id`` field for most updates makes it an +ideal sharding candidate. The sharding commands you'd use to shard +the category collection would then be the following: + +.. code-block:: python + + >>> db.command('shardcollection', 'categories', { + ... key: {'_id': 1} }) + { "collectionsharded" : "categories", "ok" : 1 } diff --git a/source/applications/use-cases/ecommerce-inventory-management.txt b/source/applications/use-cases/ecommerce-inventory-management.txt new file mode 100644 index 00000000000..e78b070c87e --- /dev/null +++ b/source/applications/use-cases/ecommerce-inventory-management.txt @@ -0,0 +1,408 @@ +================================ +E-Commerce: Inventory Management +================================ + +.. default-domain:: mongodb + +Overview +-------- + +Problem +~~~~~~~ + +You have a product catalog and you would like to maintain an accurate +inventory count as users shop your online store, adding and removing +things from their cart. + +Solution +~~~~~~~~ + +In an ideal world, consumers would begin browsing an online store, add +items to their shopping cart, and proceed in a timely manner to checkout +where their credit cards would always be successfully validated and +charged. In the real world, however, customers often add or remove items +from their shopping cart, change quantities, abandon the cart, and have +problems at checkout time. + +This solution keeps the traditional metaphor of the shopping cart, but +the shopping cart will *age* . Once a shopping cart has not been active +for a certain period of time, all the items in the cart once again +become part of available inventory and the cart is cleared. The state +transition diagram for a shopping cart is below: + +.. figure:: img/ecommerce-inventory1.png + :align: center + :alt: + +Schema +~~~~~~ + +In your inventory collection, you need to maintain the current available +inventory of each stock-keeping unit (SKU) as well as a list of 'carted' +items that may be released back to available inventory if their shopping +cart times out: + +.. code-block:: javascript + + { + _id: '00e8da9b', + qty: 16, + carted: [ + { qty: 1, cart_id: 42, + timestamp: ISODate("2012-03-09T20:55:36Z"), }, + { qty: 2, cart_id: 43, + timestamp: ISODate("2012-03-09T21:55:36Z"), }, + ] + } + +(Note that, while in an actual implementation, you might choose to merge +this schema with the product catalog schema described in +:doc:`E-Commerce: Product Catalog `, the inventory +schema is simplified here for brevity.) Continuing the metaphor of the +brick-and-mortar store, your SKU above has 16 items on the shelf, 1 in one cart, +and 2 in another for a total of 19 unsold items of merchandise. + +For the shopping cart model, you need to maintain a list of (``sku``, +``quantity``, ``price``) line items: + +.. code-block:: javascript + + { + _id: 42, + last_modified: ISODate("2012-03-09T20:55:36Z"), + status: 'active', + items: [ + { sku: '00e8da9b', qty: 1, item_details: {...} }, + { sku: '0ab42f88', qty: 4, item_details: {...} } + ] + } + +Note that the cart model includes item details in each line +item. This allows your app to display the contents of the cart to the user +without needing a second query back to the catalog collection to fetch +the details. + +Operations +---------- + +Here, the various inventory-related operations in an ecommerce site are described +as they would occur using the schema above. The examples use the Python +programming language and the ``pymongo`` MongoDB driver, but implementations +would be similar in other languages as well. + +Add an Item to a Shopping Cart +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Updating +```````` + +The most basic operation is moving an item off the "shelf" in to the +"cart." The constraint is that you would like to guarantee that you never +move an unavailable item off the shelf into the cart. To solve this +problem, this solution ensures that inventory is only updated if there is +sufficient inventory to satisfy the request: + +.. code-block:: python + + def add_item_to_cart(cart_id, sku, qty, details): + now = datetime.utcnow() + + # Make sure the cart is still active and add the line item + result = db.cart.update( + {'_id': cart_id, 'status': 'active' }, + { '$set': { 'last_modified': now }, + '$push': + 'items': {'sku': sku, 'qty':qty, 'details': details } + }, + safe=True) + if not result['updatedExisting']: + raise CartInactive() + + # Update the inventory + result = db.inventory.update( + {'_id':sku, 'qty': {'$gte': qty}}, + {'$inc': {'qty': -qty}, + '$push': { + 'carted': { 'qty': qty, 'cart_id':cart_id, + 'timestamp': now } } }, + safe=True) + if not result['updatedExisting']: + # Roll back our cart update + db.cart.update( + {'_id': cart_id }, + { '$pull': { 'items': {'sku': sku } } } + ) + raise InadequateInventory() + +Note here in particular that the system does not trust that the request is +satisfiable. The first check makes sure that the cart is still "active" +(more on inactive carts below) before adding a line item. The next check +verifies that sufficient inventory exists to satisfy the request before +decrementing inventory. In the case of inadequate inventory, the system +*compensates* for the non-transactional nature of MongoDB by removing the +cart update. Using safe=True and checking the result in the case of +these two updates allows you to report back an error to the user if the +cart has become inactive or available quantity is insufficient to +satisfy the request. + +Indexing +```````` + +To support this query efficiently, all you really need is an index on +``_id``, which MongoDB provides us by default. + +Modifying the Quantity in the Cart +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Process +``````` + +Here, you'd like to allow the user to adjust the quantity of items in their +cart. The system must ensure that when they adjust the quantity upward, there +is sufficient inventory to cover the quantity, as well as updating the +particular ``carted`` entry for the user's cart. + +.. code-block:: python + + def update_quantity(cart_id, sku, old_qty, new_qty): + now = datetime.utcnow() + delta_qty = new_qty - old_qty + + # Make sure the cart is still active and add the line item + result = db.cart.update( + {'_id': cart_id, 'status': 'active', 'items.sku': sku }, + {'$set': { + 'last_modified': now, + 'items.$.qty': new_qty }, + }, + safe=True) + if not result['updatedExisting']: + raise CartInactive() + + # Update the inventory + result = db.inventory.update( + {'_id':sku, + 'carted.cart_id': cart_id, + 'qty': {'$gte': delta_qty} }, + {'$inc': {'qty': -delta_qty }, + '$set': { 'carted.$.qty': new_qty, 'timestamp': now } }, + safe=True) + if not result['updatedExisting']: + # Roll back our cart update + db.cart.update( + {'_id': cart_id, 'items.sku': sku }, + {'$set': { 'items.$.qty': old_qty } + }) + raise InadequateInventory() + +Note in particular here the use of the positional operator '$' to +update the particular ``carted`` entry and line item that matched for the +query. This allows the system to update the inventory and keep track of the data +necessary need to "rollback" the cart in a single atomic operation. The code above +also ensures the cart is active and timestamp it as in the case of adding +items to the cart. + +Indexing +```````` + +To support this query efficiently, again all we really need is an index on ``_id``. + +Checking Out +~~~~~~~~~~~~ + +Process +``````` + +During checkout, you'd like to validate the method of payment and remove +the various ``carted`` items after the transaction has succeeded. + +.. code-block:: python + + def checkout(cart_id): + now = datetime.utcnow() + + # Make sure the cart is still active and set to 'pending'. Also + # fetch the cart details so we can calculate the checkout price + cart = db.cart.find_and_modify( + {'_id': cart_id, 'status': 'active' }, + update={'$set': { 'status': 'pending','last_modified': now } } ) + if cart is None: + raise CartInactive() + + # Validate payment details; collect payment + try: + collect_payment(cart) + db.cart.update( + {'_id': cart_id }, + {'$set': { 'status': 'complete' } } ) + db.inventory.update( + {'carted.cart_id': cart_id}, + {'$pull': {'cart_id': cart_id} }, + multi=True) + except: + db.cart.update( + {'_id': cart_id }, + {'$set': { 'status': 'active' } } ) + raise + +Here, the cart is first "locked" by setting its status to "pending" +(disabling any modifications.) Then the system collects payment data, verifying +at the same time that the cart is still active. MongoDB's +``findAndModify`` command is used to atomically update the cart and return its +details so you can capture payment information. If the payment is +successful, you then remove the ``carted`` items from individual items' +inventory and set the cart to "complete." If payment is unsuccessful, you +unlock the cart by setting its status back to "active" and report a +payment error. + +Indexing +```````` + +Once again the ``_id`` default index is enough to make this operation efficient. + +Returning Timed-Out Items to Inventory +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Process +``````` + +Periodically, you'd like to expire carts that have been inactive for a +given number of seconds, returning their line items to available +inventory: + +.. code-block:: python + + def expire_carts(timeout): + now = datetime.utcnow() + threshold = now - timedelta(seconds=timeout) + + # Lock and find all the expiring carts + db.cart.update( + {'status': 'active', 'last_modified': { '$lt': threshold } }, + {'$set': { 'status': 'expiring' } }, + multi=True ) + + # Actually expire each cart + for cart in db.cart.find({'status': 'expiring'}): + + # Return all line items to inventory + for item in cart['items']: + db.inventory.update( + { '_id': item['sku'], + 'carted.cart_id': cart['id'], + 'carted.qty': item['qty'] + }, + {'$inc': { 'qty': item['qty'] }, + '$pull': { 'carted': { 'cart_id': cart['id'] } } }) + + db.cart.update( + {'_id': cart['id'] }, + {'$set': { status': 'expired' }) + +Here, you first find all carts to be expired and then, for each cart, +return its items to inventory. Once all items have been returned to +inventory, the cart is moved to the 'expired' state. + +Indexing +```````` + +In this case, you need to be able to efficiently query carts based on +their ``status`` and ``last_modified`` values, so an index on these would help +the performance of the periodic expiration process: + +.. code-block:: python + + db.cart.ensure_index([('status', 1), ('last_modified', 1)]) + +Note in particular the order in which the index is defined: in order to +efficiently support range queries ('$lt' in this case), the ranged item +must be the last item in the index. Also note that there is no need to +define an index on the ``status`` field alone, as any queries for status +can use the compound index we have defined here. + +Error Handling +~~~~~~~~~~~~~~ + +There is one failure mode above that thusfar has not been handled adequately: the +case of an exception that occurs after updating the inventory collection +but before updating the shopping cart. The result of this failure mode +is a shopping cart that may be absent or expired where the 'carted' +items in the inventory have not been returned to available inventory. To +account for this case, you'll need to run a cleanup method periodically that +will find old ``carted`` items and check the status of their cart: + +.. code-block:: python + + def cleanup_inventory(timeout): + now = datetime.utcnow() + threshold = now - timedelta(seconds=timeout) + + # Find all the expiring carted items + for item in db.inventory.find( + {'carted.timestamp': {'$lt': threshold }}): + + # Find all the carted items that matched + carted = dict( + (carted_item['cart_id'], carted_item) + for carted_item in item['carted'] + if carted_item['timestamp'] < threshold) + + # Find any carts that are active and refresh the carted items + for cart in db.cart.find( + { '_id': {'$in': carted.keys() }, + 'status':'active'}): + cart = carted[cart['_id']] + db.inventory.update( + { '_id': item['_id'], + 'carted.cart_id': cart['_id'] }, + { '$set': {'carted.$.timestamp': now } }) + del carted[cart['_id']] + + # All the carted items left in the dict need to now be + # returned to inventory + for cart_id, carted_item in carted.items(): + db.inventory.update( + { '_id': item['_id'], + 'carted.cart_id': cart_id, + 'carted.qty': carted_item['qty'] }, + { '$inc': { 'qty': carted_item['qty'] }, + '$pull': { 'carted': { 'cart_id': cart_id } } }) + +Note that the function above is safe, as it checks to be sure the cart +is expired or expiring before removing items from the cart and returning +them to inventory. This function could, however, be slow as well as +slowing down other updates and queries, so it should be used +infrequently. + +Sharding +-------- + +If you choose to shard this system, the use of an ``_id`` field for most of +our updates makes ``_id`` an ideal sharding candidate, for both carts and +products. Using ``_id`` as your shard key allows all updates that query on +``_id`` to be routed to a single mongod process. There are two potential +drawbacks with using ``_id`` as a shard key, however. + +- If the cart collection's ``_id`` is generated in a generally increasing + order, new carts will all initially be assigned to a single shard. +- Cart expiration and inventory adjustment requires several broadcast + queries and updates if ``_id`` is used as a shard key. + +It turns out you can mitigate the first pitfall by choosing a random +value (perhaps the sha-1 hash of an ``ObjectId``) as the ``_id`` of each cart +as it is created. The second objection is valid, but relatively +unimportant, as the expiration function runs relatively infrequently and can be +slowed down by the judicious use of ``sleep()`` calls in order to +minimize server load. + +The sharding commands you'd use to shard the cart and inventory +collections, then, would be the following: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'inventory', { + ... 'key': {'_id': 1} }) + { "collectionsharded" : "inventory", "ok" : 1 } + >>> db.command('shardcollection', 'cart', { + ... 'key': {'_id': 1} }) + { "collectionsharded" : "cart", "ok" : 1 } diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 5f1d8cc9901..97117794b3f 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -436,10 +436,10 @@ following: .. code-block:: pycon - >>> db.command('shardcollection', 'character') + >>> db.command('shardcollection', 'character', { + ... 'key': { '_id': 1 } }) { "collectionsharded" : "character", "ok" : 1 } - >>> db.command('shardcollection', 'location')) + >>> db.command('shardcollection', 'location', { + ... 'key': { '_id': 1 } }) { "collectionsharded" : "location", "ok" : 1 } -Note that there is no need here to specify a :term:`shard key` since MongoDB -shards on ``_id`` by default. diff --git a/source/use-cases/hierarchical-aggregation.txt b/source/use-cases/hierarchical-aggregation.txt index 36b4cc71d84..dd035739ef6 100644 --- a/source/use-cases/hierarchical-aggregation.txt +++ b/source/use-cases/hierarchical-aggregation.txt @@ -469,24 +469,22 @@ timestamp) on the events collection. Consider the following: .. code-block:: pycon >>> db.command('shardcollection','events', { - ... key : { 'userid': 1, 'ts' : 1} } ) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key' : { 'userid': 1, 'ts' : 1} } ) { "collectionsharded": "events", "ok" : 1 } -To shard the aggregated collections you must use the ``_id`` field, -which is the default, so you can issue the following group of shard -operations in the Python/PyMongo shell: +To shard the aggregated collections you must use the ``_id`` field, so you can +issue the following group of shard operations in the Python/PyMongo shell: .. code-block:: python - db.command('shardcollection', 'stats.daily') - db.command('shardcollection', 'stats.weekly') - db.command('shardcollection', 'stats.monthly') - db.command('shardcollection', 'stats.yearly') + db.command('shardcollection', 'stats.daily', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.weekly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.monthly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.yearly', { + 'key': { '_id': 1 } }) You should also update the ``h_aggregate`` map-reduce wrapper to support sharded output Add ``'sharded':True`` to the ``out`` diff --git a/source/use-cases/pre-aggregated-reports.txt b/source/use-cases/pre-aggregated-reports.txt index 7e9424f2e17..298a96e6350 100644 --- a/source/use-cases/pre-aggregated-reports.txt +++ b/source/use-cases/pre-aggregated-reports.txt @@ -613,12 +613,7 @@ collection in the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.daily', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.daily", "ok" : 1 } Enable sharding for the monthly statistics collection with the @@ -628,12 +623,7 @@ console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.monthly', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.monthly", "ok" : 1 } .. note:: diff --git a/source/use-cases/product-catalog.txt b/source/use-cases/product-catalog.txt index 44e62658ff0..04939f6489d 100644 --- a/source/use-cases/product-catalog.txt +++ b/source/use-cases/product-catalog.txt @@ -494,11 +494,6 @@ Python/PyMongo console: >>> db.command('shardcollection', 'product', { ... key : { 'type': 1, 'details.genre' : 1, 'sku':1 } }) - -Upon success, you will see the following response: - -.. code-block:: javascript - { "collectionsharded" : "details.genre", "ok" : 1 } .. note:: diff --git a/source/use-cases/storing-comments.txt b/source/use-cases/storing-comments.txt index 7b41afda8da..8c99d5240b4 100644 --- a/source/use-cases/storing-comments.txt +++ b/source/use-cases/storing-comments.txt @@ -723,9 +723,6 @@ at the Python/PyMongo console: >>> db.command('shardcollection', 'comment_pages', { ... key : { 'discussion_id' : 1, 'page': 1 } }) + { "collectionsharded" : "comment_pages", "ok" : 1 } -This will return the following response: -.. code-block:: javascript - - { "collectionsharded" : "comment_pages", "ok" : 1 } From 3c6bc3aaf40a81eaf2f471dfd8197542a600cdb6 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 17:20:38 -0400 Subject: [PATCH 05/17] Add more operations, include location and item collections Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 210 ++++++++++++++---- 1 file changed, 163 insertions(+), 47 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 97117794b3f..b8c860d66ba 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -1,6 +1,6 @@ -================================= -Online Gaming: Storing User State -================================= +=========================================== +Online Gaming: Creating a Role-Playing Game +=========================================== .. default-domain:: mongodb @@ -8,35 +8,44 @@ Overview -------- This document outlines the basic patterns and principals for using -MongoDB as a persistent storage engine for user state data from an online +MongoDB as a persistent storage engine for an online game, particularly one that contains role-playing characteristics. Problem ~~~~~~~ In designing an online game, there is a need to store various -data about the user's character. Some of the attributes might include: +data about the player's character. Some of the attributes might include: Character attributes These might include intrinsic characteristics such as strength, dexterity, charisma, etc., as well as variable characteristics such as health, mana (if your game includes magic), etc. Character inventory - If your game includes the ability for the user to carry around + If your game includes the ability for the player to carry around objects, you will need to keep track of the items carried. Character location / relationship to the game world - If your game allows the user to move their character from one + If your game allows the player to move their character from one location to another, this information needs to be stored as well. In addition, you need to store all this data for large numbers of -users who might be playing the game simultaneously, and this data +playerss who might be playing the game simultaneously, and this data needs to be both readable and writeable with minimal latency in order to ensure responsiveness during gameplay. +In addition to the above data, you also need to store data for + +Items + These include various artifacts that the character might interact with such as + weapons, armor, treasure, etc. +Locations + The various locations in which characters and items might find themselves such + as rooms, halls, etc. + Another consideration when designing the persistence backend for an online game is its flexibility. Particularly in early releases of a game, you may wish to change gameplay mechanics significantly as you -receive feedback from your users. As you implement these changes, you +receive feedback from your players. As you implement these changes, you need to be able to migrate your persistent data from one format to another with minimal (or no) downtime. @@ -44,21 +53,24 @@ Solution ~~~~~~~~ The solution presented by this case study assumes that the read and -write performance for the user state is equally important and must be -accessible with minimal latency. +write performance is equally important and must be accessible with minimal +latency. Schema Design ~~~~~~~~~~~~~ Ultimately, the particulars of your schema depends on the particular -design of your game. When designing your schema for the user state, -you should attempt to encapsulate all the commonly used data into the -user object in order to minimize the number of queries to the database -and the number of seeks in a query. If you can manage to -encapsulate all relevant user state into a single document, this +design of your game. When designing your schema, you should attempt to +encapsulate all the commonly used data into a small number of objects in order to +minimize the number of queries to the database and the number of seeks in a +query. Encapsulating all player state into a ``character`` collection, item data +into an ``item`` collection, and location data into a ``location`` collection satisfies both these criteria. -In a role-playing game, then, a typical user state document might look +Character Schema +```````````````` + +In a role-playing game, then, a typical character state document might look like the following: .. code-block:: javascript @@ -84,7 +96,7 @@ like the following: { id:ObjectId('...'), name:'grue' }, { id:ObjectId('...'), name:'Tim' } ], - contents: [ + inventory: [ { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] }, gold: 523, @@ -94,7 +106,7 @@ like the following: { id:ObjectId('...'), region:'feet'}], weapons: [ {id:ObjectId('...'), hand:'both'} ], inventory: [ - { qty:1, id:ObjectId('...'), name:'backpack', contents: [ + { qty:1, id:ObjectId('...'), name:'backpack', inventory: [ { qty:4, id:ObjectId('...'), name: 'potion of healing'}, { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, @@ -108,7 +120,7 @@ There are a few things to note about this document: #. Information about the character's location in the game is encapsulated under the ``location`` attribute. Note in particular that all of the information - necessary to render the room is encapsulated within the user's state + necessary to render the room is encapsulated within the character state document. This allows the game system to render the room without making a second #query to the database to get room information. #. The ``armor`` and ``weapons`` attributes contain little information about the @@ -116,20 +128,73 @@ There are a few things to note about this document: the ``inventory`` property. Since the inventory information is stored in the same document, there is no need to replicate the detailed information about each item into the ``armor`` and ``weapons`` properties. -#. Finally, note that ``inventory`` contains the item details necessary for +#. ``inventory`` contains the item details necessary for rendering each item in the character's posession, including anyenchantments (``bonus``) and ``quantity``. Once again, embedding this data into the character record means you don't have to perform a separate query to fetch item details necessary for display. +Item Schema +``````````` + +Likewise, the item schema should include all details about all items globally in +the game: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'backpack', + bonus: null, + inventory: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + weight: 12, + price: 160, + ... + } + +Note that this document contains more or less the same information as stored in +the ``inventory`` attribute of ``character`` documents, as well as additional +data which may only be needed sporadically in the case of game-play such as +``weight`` and ``price``. + +Location Schema +``````````````` + +Finally, the ``location`` schema specifies the state of the world in the game: + +.. code-block:: javascript + + { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' } ], + } + +Here, note that ``location`` stores exactly the same information as is stored in +the ``location`` attribute of the ``character`` document. You will use +``location`` as the system of record when the game requires interaction between +multiple characters or between characters and non-inventory items. + Operations ---------- -In an online gaming system with the character state stored in a single document, -the primary operations you'll be performing are querying for the character state -document by ``_id``, extracting relevant data for display, and updating various -attributes about the character. This section describes procedures for performing -these queries, extractions, and updates. +In an online gaming system with the state embedded in a single document for +``character``, ``item``, and ``location``, the primary operations you'll be +performing are querying for the character state by ``_id``, extracting relevant +data for display, and updating various attributes about the character. This +section describes procedures for performing these queries, extractions, and +updates. + +In particular you should try *not* to load the ``location`` or ``item`` documents +except when absolutely necessary. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you @@ -143,11 +208,11 @@ The most basic operation in this system is loading the character state. Query ````` -Use the following query to load the user document from MongoDB: +Use the following query to load the ``character`` document from MongoDB: .. code-block:: pycon - >>> character = db.characters.find_one({'_id': user_id}) + >>> character = db.characters.find_one({'_id': character_id}) Index Support ````````````` @@ -158,11 +223,11 @@ sufficient for good performance of this query. Extract Armor and Weapon Data for Display ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In order to save space, the schema described above stores item details only in -the ``inventory`` attribute, storing ``ObjectId``\ s in other locations. To display -these item details, as on a character summary window, you need to merge the -information from the ``armor`` and ``weapons`` attributes with information from -the ``inventory`` attribute. +In order to save space, the ``character`` schema described above stores item +details only in the ``inventory`` attribute, storing ``ObjectId``\ s in other +locations. To display these item details, as on a character summary window, you +need to merge the information from the ``armor`` and ``weapons`` attributes with +information from the ``inventory`` attribute. Suppose, for instance, that your code is displaying the armor data using the following Jinja2 template: @@ -218,11 +283,13 @@ In order to build up this structure, use the following helper functions: result = {} for item in inventory: result[item['_id']] = item - if 'contents' in item: - result.update(get_item_index(item['contents'])) + if 'inventory' in item: + result.update(get_item_index(item['inventory])) return result def describe_item(item): + '''Add a 'description' field to the given item''' + result = dict(item) if item['bonus']: description = '%+d %s' % (item['bonus'], item['name']) @@ -246,12 +313,12 @@ In order to actually display the armor, then, you would use the following code: .. code-block:: pycon >>> item_index = get_item_index( - ... character['inventory'] + character['location']['contents']) + ... character['inventory'] + character['location']['inventory']) >>> armor = get_armor_for_dislay(character, item_index) Note in particular that you are building an index not only for the items the -character is actually carrying in inventory, but also for the items that the user -might interact with in the room. +character is actually carrying in inventory, but also for the items that the +player might interact with in the room. Similarly, in order to display the weapon information, you need to build a structure such as the following: @@ -299,10 +366,10 @@ data, then, is the following: >>> inventory = character['inventory'] >>> room_data = character['location'] -Update Character Inventory -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Pick Up an Item From a Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In your game, suppose the user decides to pick up an item from the room and add +In your game, suppose the player decides to pick up an item from the room and add it to their inventory. In this case, you need to update both the character state and the global location state: @@ -316,10 +383,10 @@ and the global location state: db.character.update( { '_id': character['_id'] }, { '$push': { 'inventory': item }, - '$pull': { 'location.contents': { '_id': item['id'] } } }) + '$pull': { 'location.inventory': { '_id': item['id'] } } }) db.location.update( { '_id': character['location']['id'] }, - { '$pull': { 'contents': { 'id': item_id } } }) + { '$pull': { 'inventory': { 'id': item_id } } }) While the above code may be for a single-player game, if you allow multiple players, or non-player characters, to pick up items, that introduces a problem in @@ -336,8 +403,8 @@ this case, the code is now the following: character['inventory'].append(item) result = db.location.update( { '_id': character['location']['id'], - 'contents.id': item_id }, - { '$pull': { 'contents': { 'id': item_id } } }, + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, safe=True) if not result['updatedExisting']: raise Conflict() @@ -350,10 +417,58 @@ By ensuring that the item is present before removing it from the room in the ``update`` call above, you guarantee that only one player/non-player character/monster can pick up the item. +Remove an Item from a Container +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the game described here, the ``backpack`` item can contain other +items. You might further suppose that some other items may be similarly +hierarchical (e.g. a chest in a room). Suppose that the player wishes to move an +item from one of these "containers" into their active ``inventory`` as a prelude +to using it. In this case, you need to update both the character state and the +item state: + +.. code-block:: python + + def move_to_active_inventory(character, item_index, container_id, item_id): + '''Transfer an item from the given container to the character's active + inventory + ''' + + result = db.item.update( + { '_id': container_id, + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + item = item_index[item_id] + container = item_index[item_id] + character['inventory'].append(item) + container['inventory'] = [ + item for item in container['inventory'] + if item['_id'] != item_id ] + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item } } ) + db.character.update( + { '_id': character['_id'], 'inventory.id': container_id }, + { '$pull': { 'inventory.$.inventory': { 'id': item_id } } } ) + +Note in the code above that you: + +- Ensure that the item's state makes this update reasonable (the item is + actually contained within the container). Abort with an error if this is not + true. +- Update the in-memory ``character`` document's inventory, adding the item. +- Update the in-memory ``container`` document's inventory, removing the item. +- Update the ``character`` document in MongoDB. +- In the case that the character is moving an item from a container *in his own + inventory*, update the character's inventory representation of the container. + Move the Character to a Different Room ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In your game, suppose the user decides to move north. In this case, you need to +In your game, suppose the player decides to move north. In this case, you need to update the character state to match the new location: .. code-block:: python @@ -389,11 +504,12 @@ gold, and update the room: .. code-block:: python - def buy(character, shopkeeper, item_id, price): + def buy(character, shopkeeper, item_id): '''Pick up an item, add to the character's inventory, and transfer payment to the shopkeeper ''' + price = db.item.find_one({'_id': item_id}, {'price':1})['price'] result = db.character.update( { '_id': character['_id'], 'gold': { '$gte': price } }, From 9e9669e2c03d2ed855edcbf18b6ba0ab0c7e83d9 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 4 Apr 2012 17:02:28 -0400 Subject: [PATCH 06/17] Fix spelling error Signed-off-by: Rick Copeland --- source/applications/use-cases/gaming-user-state.txt | 2 +- source/applications/use-cases/use-case-template.txt | 2 +- source/use-cases/product-catalog.txt | 2 +- source/use-cases/storing-log-data.txt | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index b8c860d66ba..e3f84e5cb81 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -7,7 +7,7 @@ Online Gaming: Creating a Role-Playing Game Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for an online game, particularly one that contains role-playing characteristics. diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt index 9a8dcef8b44..da5a99f2269 100644 --- a/source/applications/use-cases/use-case-template.txt +++ b/source/applications/use-cases/use-case-template.txt @@ -9,7 +9,7 @@ TODO: Section: Title Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for TODO: what are we building? Problem diff --git a/source/use-cases/product-catalog.txt b/source/use-cases/product-catalog.txt index 04939f6489d..f1458e28303 100644 --- a/source/use-cases/product-catalog.txt +++ b/source/use-cases/product-catalog.txt @@ -7,7 +7,7 @@ Product Catalog Overview -------- -This document describes the basic patterns and principals for +This document describes the basic patterns and principles for designing an E-Commerce product catalog system using MongoDB as a storage engine. diff --git a/source/use-cases/storing-log-data.txt b/source/use-cases/storing-log-data.txt index 81a63bf9360..fc015e5bcfa 100644 --- a/source/use-cases/storing-log-data.txt +++ b/source/use-cases/storing-log-data.txt @@ -7,7 +7,7 @@ Storing Log Data Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data. From 7621bbe10b32f0f75db58a6b8afb9e6738f3a5dd Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 4 Apr 2012 18:10:03 -0400 Subject: [PATCH 07/17] Begin advertising use case Signed-off-by: Rick Copeland --- .../use-cases/ad-campaign-management.txt | 149 ++++++++++++++++++ source/applications/use-cases/index.txt | 8 + .../use-cases/use-case-template.txt | 2 + 3 files changed, 159 insertions(+) create mode 100644 source/applications/use-cases/ad-campaign-management.txt diff --git a/source/applications/use-cases/ad-campaign-management.txt b/source/applications/use-cases/ad-campaign-management.txt new file mode 100644 index 00000000000..609436534b4 --- /dev/null +++ b/source/applications/use-cases/ad-campaign-management.txt @@ -0,0 +1,149 @@ +.. -*- rst -*- + +======================================= +Online Advertising: Campaign Management +======================================= + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on creating and maintaining an advertising +campaign with a pre-set daily budget and cost per click (CPC) and cost per +thousand impressions (CPM) limit. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media. As part of this ad serving, you want to track which ads are +available to be served, based on both the daily budget and the CPC and CPM +limits. + +As part of a campaign, a customer creates one or more *zones*, where +each zone represents some location on a group of pages. Each zone in a campaign +then has one or more ads assigned to it + +Solution +~~~~~~~~ + +In this solution, you will store each campaign's metadata in its own document, +including budget, limits, targets, and ongoing statistics. This data can be +modified before the campaign starts or during the campaign itself. + +Schema Design +~~~~~~~~~~~~~ + +The schema for campaign management consists of a two collections, one which +stores campaign metadata, and another for campaign statistics. The campaign +metadata collection ``campaign.metadata`` documents have the following +format: + +.. code-block:: javascript + + { + _id: ObjectId(...), + customer_id: ObjectId(...), + title: "August Shoes Campaign", + begin: ISODate("2012-08-01T00:00:00Z"), + end: ISODate("2012-08-31T00:00:00Z"), + zones: { + z1: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'banner', + limit: { type: 'cpm', value: 2000 }, + ad_ids: [ 'ad1', 'ad2' ] }, + z2: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'tower-1a', + limit: { type: 'cpc', value: 45 }, + ad_ids: [ 'ad3' ] } }, + daily_budget: 25000 + } + +The statistics are stored in their own collection ``campaign.stats``: + +.. code-block:: javascript + + { + _id: ObjectId(...), // same as campaign ID + zones: { + z1: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10146, clicks: 198, conversions: 16 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9182, clicks: 183, conversions: 18 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9784, clicks: 202, conversions: 21 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }, + z2: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10457, clicks: 79, conversions: 14 }, + ad3: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9283, clicks: 53, conversions: 8 }, + ad3: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9197, clicks: 72, conversions: 14 }, + ad3: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }], + daily_spent: { + '2012-08-01': 23847, + '2012-08-02': 20749, + ... + '2012-08-12': 0, + '2012-08-13': 0, + ... + '2012-08-31': 0 } + } + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 5aa6023e118..86b7c52d8db 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -41,3 +41,11 @@ Online Gaming :maxdepth: 2 gaming-user-state + +Online Advertising +------------------ + +.. toctree:: + :maxdepth: 2 + + ad-campaign-management diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt index da5a99f2269..4ab5ba7cd5f 100644 --- a/source/applications/use-cases/use-case-template.txt +++ b/source/applications/use-cases/use-case-template.txt @@ -1,3 +1,5 @@ +.. -*- rst -*- + :orphan: ==================== From ad9ddd57b5a69a8385654f7dc803cff6974a29de Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 6 Apr 2012 14:00:54 -0400 Subject: [PATCH 08/17] Creating ad serving use case Signed-off-by: Rick Copeland --- .../applications/use-cases/ad-serving-ads.txt | 294 ++++++++++++++++++ source/applications/use-cases/index.txt | 1 + source/use-cases/storing-comments.txt | 2 +- 3 files changed, 296 insertions(+), 1 deletion(-) create mode 100644 source/applications/use-cases/ad-serving-ads.txt diff --git a/source/applications/use-cases/ad-serving-ads.txt b/source/applications/use-cases/ad-serving-ads.txt new file mode 100644 index 00000000000..24475c3b133 --- /dev/null +++ b/source/applications/use-cases/ad-serving-ads.txt @@ -0,0 +1,294 @@ +.. -*- rst -*- + +============================== +Online Advertising: Ad Serving +============================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on the task of deciding *which* ad to serve +when a user visits a particular site. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media sites. As part of this ad serving, you want to track which ads are +available to be served, and decide on a particular ad to be served in a +particular zone. + +Solution +~~~~~~~~ + +This solution is structured as a progressive refinement of the ad network, +starting out with the basic data storage requirements and adding more advanced +features to the schema to support more advanced ad targeting. The key performance +criterion for this solution is the latency between receiving an ad request and +returning the (targeted) ad to be displayed. + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Design 1: Basic Ad Serving +-------------------------- + +A basic ad serving algorithm consists of the following steps: + +#. The network receives a request for an ad, specifying at a minimum the + ``site_id`` and ``zone_id`` to be served. +#. The network consults its inventory of ads available to display and chooses an + ad based on various business rules. +#. The network returns the actual ad to be displayed, possibly recording the + decision made as well. + +This design uses the ``site_id`` and ``zone_id`` submitted with the ad request, +as well as information stored in the ad inventory collection, to make the ad +targeting decisions. Later examples will build on this, allowing more advanced ad +targeting. + +Schema Design +~~~~~~~~~~~~~ + +A very basic schema for storing ads available to be served consists of a single +collection, ``ad.zone``: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'banner', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23a', + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23b', + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'banner12', + ecpm: 200 }, + ... ] + } + +Note that for each (``site``, ``zone``) combination you'll be storing a list of +ads, sorted by their ``ecpm`` values. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve selects a compatible ad and +sorts by the advertiser's ``ecpm`` bid in order to maximize the ad network's +profits: + +.. code-block:: python + + from itertools import groupby + from random import choice + + def choose_ad(site_id, zone_id): + site = db.ad.inventory.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return None + if len(site['ads']) == 0: return None + ecpm_groups = groupby(site['ads'], key=lambda ad:ad['ecpm']) + ecpm, ad_group = ecpm_groups.next() + return choice(list(ad_group)) + +Index Support +````````````` + +In order to execute the ad choice with the lowest latency possible, you'll want +to have a compound index on (``site_id``, ``zone_id``): + +.. code-block:: pycon + + >>> db.ad.inventory.ensure_index([ + ... ('site_id', 1), + ... ('zone_id', 1) ]) + +Making an Ad Campaign Inactive +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One case you'll have to deal with in this solution making a campaign +inactive. This may happen for a variety of reasons. For instance, the campaign +may have reached its end date or exhausted its budget for the current time +period. In this case, the logic is fairly straightforward: + +.. code-block:: python + + def deactivate_campaign(campaign_id): + db.ad.inventory.update( + { 'ads.campaign_id': campaign_id }, + {' $pull': { 'ads', { 'campaign_id': campaign_id } } }, + multi=True) + +The update statement above first selects only those ad zones which had avaialable +ads from the given ``campaign_id`` and then uses the ``$pull`` modifier to remove +them from rotation. + +Index Support +````````````` + +In order to execute the multi-update quickly, you should maintain an index on the +``ads.campaign_id`` field: + +.. code-block:: pycon + + >>> db.ad.inventory.ensure_index('ads.campaign_id') + +Sharding +~~~~~~~~ + +In order to scale beyond the capacity of a single replica set, you will need to +shard the ``ad.inventory`` collection. To maintain the lowest possible latency in +the ad selection operation, the :term:`shard key` needs to be chosen to allow +MongoDB to route the ``ad.inventory`` query to a single shard. In this case, a +good approach is to shard on the (``site_id``, ``zone_id``) combination: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.inventory', { + ... 'key': {'site_id': 1, 'zone_id': 1} }) + { "collectionsharded": "ad.inventory", "ok": 1 } + +Design 2: Adding Frequency Capping +---------------------------------- + +One problem with the logic described in Design 1 above is that it will tend to +display the same ad over and over again until the campaign's budget is +exhausted. To mitigate this, advertisers may wish to limit the frequency with +which a given user is presented a particular ad. This process is called frequency +capping and is an example of user profile targeting in advertising. + +In order to perform frequency capping (or any type of user targeting), the ad +network needs to maintain a profile for each visitor, typically implemented as a +cookie in the user's browser. This cookie, effectively a ``user_id``, is then +transmitted to the ad network when logging impressions, clicks, conversions, +etc., as well as the ad serving decision. This section focuses on how that +profile data impacts the ad serving decision. + +Schema Design +~~~~~~~~~~~~~ + +In order to use the user profile data, you need to store it. In this case, it's +stored in a collection ``ad.user``: + +.. code-block:: javascript + + { + _id: 'cookie_value', + advertisers: { + mercedes: { + impressions: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + clicks: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + bmw: [ ... ], + ... + } + } + +There are a few things to note about the user profile: + +- Profile information is segmented by advertiser. Typically advertising data is + sensitive competitive infomration that can't be shared among advertisers, so + this must be kept separate. +- All data is embedded in a single profile document. When you need to query this + data (detailed below), you don't necessarily know which advertiser's ads you'll + be showing, so it's a good practice to embed all advertisers in a single + document. +- The event information is grouped by event type within an advertiser, and sorted + by timestamp. This allows rapid lookups of a stream of a particular type of + event. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve now needs to iterate through +ads in order of desireability and select the "best" ad that also satisfies the +advertiser's targeting rules (in this case, the frequency cap): + +.. code-block:: python + + from itertools import groupby + from random import shuffle + from datetime import datetime, timedelta + + def choose_ad(site_id, zone_id, user_id): + site = db.ad.inventory.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None or len(site['ads']) == 0: return None + ads = ad_iterator(site['ads']) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: + # any ad is acceptable for an unknown user + return ads.next() + for ad in ads: + advertiser_id = ad['campaign_id'].split(':', 1)[0] + if ad_is_acceptable(ad, user[advertiser_id]): + return ad + return None + + def ad_iterator(ads): + '''Find available ads, sorted by ecpm, with random sort for ties''' + ecpm_groups = groupby(ads, key=lambda ad:ad['ecpm']) + for ecpm, ad_group in ecpm_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_is_acceptable(ad, profile): + '''Returns False if the user has seen the ad today''' + threshold = datetime.utcnow() - timedelta(days=1) + for event in reversed(profile['impressions']): + if event['timestamp'] < threshold: break + if event['detail']['ad_unit_id'] == ad['ad_unit_id']: + return False + return True + +Here, the ``chose_ad()`` function provides the framework for your ad selection +process. The ``site`` is fetched first, and then passed to the ``ad_iterator()`` +function which will yield ads in order of desirability. Each ad is then checked +using the ``ad_is_acceptable()`` function to determine if it meets the +advertiser's rules. + +The ``ad_is_acceptable()`` function then iterates over all the ``impressions`` +stored in the user profile, from most recent to oldest, within a certain +``thresold`` time period (shown here as 1 day). If the same ``ad_unit_id`` +appears in the mipression stream, the ad is rejected. Otherwise it is acceptable +and can be shown to the user. + +Index Support +````````````` + +In order to retrieve the user profile with the lowest latency possible, there +needs to be an index on the ``_id`` field, which MongoDB supplies by default. + +Sharding +~~~~~~~~ + +When sharding the ``ad.user`` collection, choosing the ``_id`` field as a +:term:`shard key` allows MongoDB to route queries and updates to the profile: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.user', { + ... 'key': {'_id': 1 } }) + { "collectionsharded": "ad.user", "ok": 1 } diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 86b7c52d8db..b754bb89bd5 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -48,4 +48,5 @@ Online Advertising .. toctree:: :maxdepth: 2 + ad-serving-ads ad-campaign-management diff --git a/source/use-cases/storing-comments.txt b/source/use-cases/storing-comments.txt index 8c99d5240b4..c70757f8195 100644 --- a/source/use-cases/storing-comments.txt +++ b/source/use-cases/storing-comments.txt @@ -701,7 +701,7 @@ at the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'comments', { - ... key : { 'discussion_id' : 1, 'full_slug': 1 } }) + ... 'key' : { 'discussion_id' : 1, 'full_slug': 1 } }) This will return the following response: From 1c2137bd11fdf5990b4d3cde0690bf018c8ce8f3 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 6 Apr 2012 17:59:41 -0400 Subject: [PATCH 09/17] Add keyword targeting to the ad network Signed-off-by: Rick Copeland --- .../applications/use-cases/ad-serving-ads.txt | 111 ++++++++++++++++-- 1 file changed, 102 insertions(+), 9 deletions(-) diff --git a/source/applications/use-cases/ad-serving-ads.txt b/source/applications/use-cases/ad-serving-ads.txt index 24475c3b133..20afcac0b59 100644 --- a/source/applications/use-cases/ad-serving-ads.txt +++ b/source/applications/use-cases/ad-serving-ads.txt @@ -93,7 +93,7 @@ profits: from random import choice def choose_ad(site_id, zone_id): - site = db.ad.inventory.find_one({ + site = db.ad.zone.find_one({ 'site_id': site_id, 'zone_id': zone_id}) if site is None: return None if len(site['ads']) == 0: return None @@ -109,7 +109,7 @@ to have a compound index on (``site_id``, ``zone_id``): .. code-block:: pycon - >>> db.ad.inventory.ensure_index([ + >>> db.ad.zone.ensure_index([ ... ('site_id', 1), ... ('zone_id', 1) ]) @@ -124,7 +124,7 @@ period. In this case, the logic is fairly straightforward: .. code-block:: python def deactivate_campaign(campaign_id): - db.ad.inventory.update( + db.ad.zone.update( { 'ads.campaign_id': campaign_id }, {' $pull': { 'ads', { 'campaign_id': campaign_id } } }, multi=True) @@ -141,22 +141,22 @@ In order to execute the multi-update quickly, you should maintain an index on th .. code-block:: pycon - >>> db.ad.inventory.ensure_index('ads.campaign_id') + >>> db.ad.zone.ensure_index('ads.campaign_id') Sharding ~~~~~~~~ In order to scale beyond the capacity of a single replica set, you will need to -shard the ``ad.inventory`` collection. To maintain the lowest possible latency in +shard the ``ad.zone`` collection. To maintain the lowest possible latency in the ad selection operation, the :term:`shard key` needs to be chosen to allow -MongoDB to route the ``ad.inventory`` query to a single shard. In this case, a +MongoDB to route the ``ad.zone`` query to a single shard. In this case, a good approach is to shard on the (``site_id``, ``zone_id``) combination: .. code-block:: pycon - >>> db.command('shardcollection', 'ad.inventory', { + >>> db.command('shardcollection', 'ad.zone', { ... 'key': {'site_id': 1, 'zone_id': 1} }) - { "collectionsharded": "ad.inventory", "ok": 1 } + { "collectionsharded": "ad.zone", "ok": 1 } Design 2: Adding Frequency Capping ---------------------------------- @@ -232,7 +232,7 @@ advertiser's targeting rules (in this case, the frequency cap): from datetime import datetime, timedelta def choose_ad(site_id, zone_id, user_id): - site = db.ad.inventory.find_one({ + site = db.ad.zone.find_one({ 'site_id': site_id, 'zone_id': zone_id}) if site is None or len(site['ads']) == 0: return None ads = ad_iterator(site['ads']) @@ -292,3 +292,96 @@ When sharding the ``ad.user`` collection, choosing the ``_id`` field as a >>> db.command('shardcollection', 'ad.user', { ... 'key': {'_id': 1 } }) { "collectionsharded": "ad.user", "ok": 1 } + +Design 3: Keyword Targeting +--------------------------- + +Where frequency capping above is an example of user profile targeting, you may +also wish to perform content targeting so that the user receives relevant ads for +the particular page being viewed. The simplest example of this is targeting ads +at the result of a search query. In this case, a list of ``keywords`` is sent to +the ``choose_ad()`` call along with the ``site_id``, ``zone_id``, and +``user_id``. + + +Schema Design +~~~~~~~~~~~~~ + +In order to choose relevant ads, you'll need to expand the ``ad.zone`` collection +to store keywords for each ad: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'search', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search1', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search2', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'search1', + keywords: [ 'car', 'performance' ], + ecpm: 200 }, + ... ] + } + +Choosing a Group of Ads to Serve +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the approach described here, you'll choose a number of ads that match the +keywords used in the search, so the code below has been modified to return an +iterator over ads in descending order of preference: + +.. code-block:: python + + def choose_ads(site_id, zone_id, user_id, keywords): + site = db.ad.zone.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return [] + ads = ad_iterator(site['ads'], keywords) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: return ads + advertiser_ids = ( + ad['campaign_id'].split(':', 1)[0] + for ad in ads ) + return ( + ad for ad, advertiser_id in izip( + ads, advertiser_ids) + if ad_is_acceptible(ad, user[advertiser_id]) ) + + def ad_iterator(ads, keywords): + '''Find available ads, sorted by score, with random sort for ties''' + keywords = set(keywords) + scored_ads = [ + (ad_score(ad, keywords), ad) + for ad in ads ] + score_groups = groupby( + sorted(scored_ads), key=lambda score, ad: score) + for score, ad_group in score_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_score(ad, keywords): + '''Compute a desirability score based on the ad ecpm and keywords''' + matching = set(ad['keywords']).intersection(keywords) + return ad['ecpm'] * math.log( + 1.1 + len(matching)) + + def ad_is_acceptible(ad, profile): + # same as above + +The main thing to note in the code above is that a must now be sorted according +to some ``score`` which in this case is computed based on a combination of the +``ecpm`` of the ad as well as the number of keywords matched. More advanced use +cases may boost the importance of various keywords, but this goes beyond the +scope of this use case. One thing to keep in mind is that the fact that ads are +now being sorted at ad display time may cause performance issues if there are are +large number of ads competing for the same display slot. From 6514330c119f03e6e1d54ac8e89db02d9d938017 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 11 Apr 2012 19:11:01 -0400 Subject: [PATCH 10/17] Adding social graph and update document (partial, working through schema design) Signed-off-by: Rick Copeland --- source/applications/use-cases/index.txt | 8 + .../use-cases/social-user-profile.txt | 224 ++++++++++++++++++ 2 files changed, 232 insertions(+) create mode 100644 source/applications/use-cases/social-user-profile.txt diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index b754bb89bd5..53c4f96c0b6 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -50,3 +50,11 @@ Online Advertising ad-serving-ads ad-campaign-management + +Social Networking +----------------- + +.. toctree:: + :maxdepth: 2 + + social-user-profile diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt new file mode 100644 index 00000000000..f5ce17304a2 --- /dev/null +++ b/source/applications/use-cases/social-user-profile.txt @@ -0,0 +1,224 @@ +.. -*- rst -*- + +================================== +Social Networking: Storing Updates +================================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for a social network. In particular, this +document focuses on the task of storing and displaying user updates. + +Problem +~~~~~~~ + +You want to create an social network that will store profile information about +each user as well as allow the user to create various types of posts and updates +which will then be seen on their "friends'" walls. + +Solution +~~~~~~~~ + +The solution described below assumes a *directed* social graph where a user can +choose whether or not to follow another user. The solution is designed in such a +way as to minimize the number of documents that must be loaded in order to +display any given page, even at the expense of complicating updates. + +The particulars of what type of data you want ot host on your social network +obviously depends on the type of social network you are designing, and is largely +beyond the scope of this use case. In particular, the main variables that you +will have to consider in adapting this use case to your particular situation are: + +What data should be in a user profile + This may include gender, age, interests, relationship status, etc. for a + "casual" social network, or may include resume-type data for a more + "business-oriented" social network. +What type of updates are allowed + Again, depending on what flavor of social network you are designing, you may + wish to allow posts such as status updates, photos, links, checkins, and + polls, or you may wish to restrict your users to links and status updates. + +Schema Design +~~~~~~~~~~~~~ + +In the solution presented here, you will use two main "independent" collections +and two "dependent" collections to store user profile data and posts. + +Independent Collections +``````````````````````` + +The first +collection, ``social.user``, stores the social graph information for a given user +along with the user's profile data: + +.. code-block:: javascript + + { + _id: 'T4Y...AC', // base64-encoded ObjectId + name: 'Rick', + profile: { ... age, location, interests, etc. ... }, + followers: { + "T4Y...AD": { name: 'Jared' }, + "T4Y...AE": { name: 'Max' }, + "T4Y...AF": { name: 'Bernie' }, + "T4Y...AH": { name: 'Paul' }, + ... + ], + circles: { + work: { + "T4Y...AD": { name: 'Jared' }, + "T4Y...AE": { name: 'Max' }, + "T4Y...AF": { name: 'Bernie' }, + "T4Y...AH": { name: 'Paul' }, + ... }, + ...} + ] + } + +There are a few things ot note about this schema: + +- Rather than using a "raw" ``ObjectId`` for your ``_id`` field, you'll use a + base64-encoded version. This allows you to use ``_id`` values as keys in + subdocuments, which both reduces the memory footprint of these subdocuments as + well as speeding up some operations. +- The users being "followed" are broken into ``circles`` to facilitate sharing + with a subgroup. +- The ``followers`` subdocument is technically redundant, since it can be + computed from the ``circles`` property. Having ``followers`` available on the + ``social.user`` document, however, is useful both for displaying the user's + followers on the profile or "wall" page, as well as propagating posts to other + users, as you'll see below. +- The particular profile data stored for the user is isolated into the + ``profile`` subdocument, allowing you to evolve the schema as necessary without + worrying about introducing bugs into the social graph. + +Of course, to make the network interesting, it's necessary to add various types of +posts. These are stored in the ``social.post`` collection: + +.. code-block:: javascript + + { + _id: ObjectId(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'status', + ts: ISODateTime(...), + detail: { + text: 'Loving MongoDB' }, + comments: [ + { by: { id:"T4Y...AG", name: 'Dwight' }, + ts: ISODateTime(...), + text: 'Right on!' }, + ... all comments listed ... ] + } + +Here, the post stores the author information (``by``), the post ``type``, a +timestamp ``ts``, post details ``detail`` (which vary by post type), and a +``comments`` array. In this case, the schema embeds all comments on a post as a +time-sorted flat array. For a more in-depth exploration of the other approaches +to storing comments, please see the document +:doc:`CMS: Storing Comments `. + +One thing to note about the ``social.post`` collection is that it encapsulates +the polymorphic ``detail`` subdocument which would store different data for a +photo post versus a status update, for example. + +Dependent Collections +``````````````````````` + +social.wall (block of X most recent [partial] posts on user's wall) + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + num_posts: 42, + posts: [ + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'status', + detail: { text: 'Loving MongoDB' }, + comments: [ + { by: { id: "T4Y...AG", name: 'Dwight', + ts: ISODateTime(...), + text: 'Right on!' }, + ... only last X comments listed ... + ] + }, + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'checkin', + detail: { + text: 'Great office!', + geo: [ 40.724348,-73.997308 ], + name: '10gen Office', + photo: 'http://....' }, + comments: [ + { by: { id: "T4Y...AD", name: 'Jared' }, + ts: ISODateTime(...), + text: 'Wrong coast!' }, + ... only last X comments listed ... + ] + }, + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...g9", name: 'Rick' }, + type: 'status', + detail: { + text: 'So when do you crush Oracle?' }, + comments: [ + { by: { id: "T4Y...AE", name: 'Max' }, + ts: ISODateTime(...), + text: 'Soon... ;-)' }, + ... only last X comments listed ... + ] + }, + ] + } + +social.news (block of X most recent [partial] posts on user's news feed) + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + num_posts: 42, + posts: [ ... ] + } + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. From 21976a3380451d4c74c8cda34f9b748c54907ab9 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 12 Apr 2012 18:03:39 -0400 Subject: [PATCH 11/17] More details and operations Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 248 ++++++++++++++++-- 1 file changed, 221 insertions(+), 27 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index f5ce17304a2..54431f9f2cb 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -24,20 +24,22 @@ Solution ~~~~~~~~ The solution described below assumes a *directed* social graph where a user can -choose whether or not to follow another user. The solution is designed in such a +choose whether or not to follow another user. Additionally, the user can +designate "circles" of users to follow, in order to facilitate fine-grained +control of privacy. The solution presented below is designed in such a way as to minimize the number of documents that must be loaded in order to display any given page, even at the expense of complicating updates. -The particulars of what type of data you want ot host on your social network +The particulars of what type of data you want to host on your social network obviously depends on the type of social network you are designing, and is largely beyond the scope of this use case. In particular, the main variables that you will have to consider in adapting this use case to your particular situation are: -What data should be in a user profile +What data should be in a user profile? This may include gender, age, interests, relationship status, etc. for a "casual" social network, or may include resume-type data for a more "business-oriented" social network. -What type of updates are allowed +What type of updates are allowed? Again, depending on what flavor of social network you are designing, you may wish to allow posts such as status updates, photos, links, checkins, and polls, or you may wish to restrict your users to links and status updates. @@ -46,7 +48,7 @@ Schema Design ~~~~~~~~~~~~~ In the solution presented here, you will use two main "independent" collections -and two "dependent" collections to store user profile data and posts. +and three "dependent" collections to store user profile data and posts. Independent Collections ``````````````````````` @@ -62,39 +64,43 @@ along with the user's profile data: name: 'Rick', profile: { ... age, location, interests, etc. ... }, followers: { - "T4Y...AD": { name: 'Jared' }, - "T4Y...AE": { name: 'Max' }, - "T4Y...AF": { name: 'Bernie' }, - "T4Y...AH": { name: 'Paul' }, + "T4Y...AD": { name: 'Jared', circles: [ 'python', 'authors'] }, + "T4Y...AF": { name: 'Bernie', circles: [ 'python' ] }, + "T4Y...AI": { name: 'Meghan', circles: [ 'python', 'speakers' ] }, ... ], circles: { - work: { + "10gen": { "T4Y...AD": { name: 'Jared' }, "T4Y...AE": { name: 'Max' }, "T4Y...AF": { name: 'Bernie' }, "T4Y...AH": { name: 'Paul' }, ... }, ...} - ] + }, + blocked: ['gh1...0d'], + pages: { wall: 4, news: 3 } } -There are a few things ot note about this schema: +There are a few things to note about this schema: - Rather than using a "raw" ``ObjectId`` for your ``_id`` field, you'll use a base64-encoded version. This allows you to use ``_id`` values as keys in subdocuments, which both reduces the memory footprint of these subdocuments as well as speeding up some operations. -- The users being "followed" are broken into ``circles`` to facilitate sharing - with a subgroup. -- The ``followers`` subdocument is technically redundant, since it can be - computed from the ``circles`` property. Having ``followers`` available on the - ``social.user`` document, however, is useful both for displaying the user's - followers on the profile or "wall" page, as well as propagating posts to other - users, as you'll see below. +- The social graph is stored bidirectionally in the ``followers`` and ``circles`` + collections. While this is technically redundant, having the bidirectional + connections is userful both for displaying the user's followers on the profile + page, as well as propagating posts to other users, as shown below. +- In addition to the normal "positive" social graph, the schema above also stores + a block list which contains an array of user ids for posters whose posts never + appear on the user's wall or news feed. - The particular profile data stored for the user is isolated into the ``profile`` subdocument, allowing you to evolve the schema as necessary without worrying about introducing bugs into the social graph. +- The ``pages`` property is used to store the number of pages in the + ``social.wall``, and ``social.news`` collections for this + particular user. These will be used below when creating new posts. Of course, to make the network interesting, it's necessary to add various types of posts. These are stored in the ``social.post`` collection: @@ -104,6 +110,7 @@ posts. These are stored in the ``social.post`` collection: { _id: ObjectId(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], type: 'status', ts: ISODateTime(...), detail: { @@ -115,32 +122,53 @@ posts. These are stored in the ``social.post`` collection: ... all comments listed ... ] } -Here, the post stores the author information (``by``), the post ``type``, a +Here, the post stores minimal author information (``by``), the post ``type``, a timestamp ``ts``, post details ``detail`` (which vary by post type), and a ``comments`` array. In this case, the schema embeds all comments on a post as a time-sorted flat array. For a more in-depth exploration of the other approaches to storing comments, please see the document :doc:`CMS: Storing Comments `. -One thing to note about the ``social.post`` collection is that it encapsulates -the polymorphic ``detail`` subdocument which would store different data for a -photo post versus a status update, for example. +A couple of points are worthy of further discussion: + +- Author information is truncated; just enough is stored in each ``by`` property + to display the author name and a link to the author profile. If your user + wants more detail on a particular author, you can fetch this information as + they request it. Storing minimal information like this helps keep the document + small (and therefore fast.) +- The visibility of the post is controlled via the ``circles`` property; any user + that is part of one of the listed circles can view the post. The special values + ``"\*public*"`` and ``"\*circles*"`` allow the user to share a post with the + whole world or with any users in any of the posting user's circles, respectively. +- Different types of posts may contain different types of data in the ``detail`` + field. Isolating this polymorphic information into a subdocument is a good + practice, helping you to clearly see which parts of the document are common to + all posts and which can vary. In this case, you would store different data for + a photo post versus a status update, while still keeping the metadata (``_id``, + ``by``, ``circles``, ``type``, ``ts``, and ``comments``) the same. Dependent Collections -``````````````````````` +````````````````````` -social.wall (block of X most recent [partial] posts on user's wall) +In addition to the independent collections above, for optimal performance you'll +need to create a few dependent collections that will be used to cache +information for display. The first of these collections is the ``social.wall`` +collection, and is intended to display a "wall" containing posts created by or +directed to a particular user. The format of the ``social.wall`` collection +follows. .. code-block:: javascript { _id: ObjectId(...), user_id: "T4Y...AE", + page: 4, num_posts: 42, posts: [ { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], type: 'status', detail: { text: 'Loving MongoDB' }, comments: [ @@ -153,6 +181,7 @@ social.wall (block of X most recent [partial] posts on user's wall) { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*circles*' ], type: 'checkin', detail: { text: 'Great office!', @@ -169,6 +198,7 @@ social.wall (block of X most recent [partial] posts on user's wall) { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...g9", name: 'Rick' }, + circles: [ '10gen' ], type: 'status', detail: { text: 'So when do you crush Oracle?' }, @@ -179,16 +209,35 @@ social.wall (block of X most recent [partial] posts on user's wall) ... only last X comments listed ... ] }, + ... ] } -social.news (block of X most recent [partial] posts on user's news feed) +There are a few things to note about this schema: + +- Each post is listed with an abbreviated number of comments (3 might be + typical.) This is to keep the size of the document reasonable. If you need to + display more comments on a post, you would then query the ``social.post`` + collection for full details. +- There are actually multiple ``social.wall`` documents for each ``social.user`` + document. This allows the system to keep a "page" of recent posts in the + initial page view, fetching older "pages" if requested. A ``page`` property + keeps track of the position of this page of posts on the user's overall wall + timeline along with the timestamps on individual posts. +- Once again, the ``by`` properties store only the minimal author information for + display, helping to keep this document small. + +The other dependent collection you'll use is ``social.news``, posts from people +the user follows. This schema includes much of the same information as the +``social.wall`` information, so the document below has been abbreviated for +clarity: .. code-block:: javascript { _id: ObjectId(...), user_id: "T4Y...AE", + page: 3, num_posts: 42, posts: [ ... ] } @@ -196,12 +245,157 @@ social.news (block of X most recent [partial] posts on user's news feed) Operations ---------- -TODO: summary of the operations section +Since the schemas above optimize for read performance at the possible expense +of write performance, you should ideally provide a queueing system for +processing updates which may take longer than your desired web request latency. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you can implement this system using any language you choose. +Viewing a News Feed or Wall Posts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The most common operation on a social network probably the display of a +particular user's news feed, followed by a user's wall posts. Since the +``social.news`` and ``social.wall`` collections are optimized for these +operations, the query is fairly straightforward. Since these two collections +share a schema, viewing the posts for a news feed or a wall are actually quite +similar operations, and can be supported by the same code: + +.. code-block:: python + + def get_posts(collection, user_id, page=None): + spec = { 'user_id': viewed_user_id } + if page is not None: + spec['page'] = {'$lte': page} + cur = collection.find(spec) + cur = cur.sort('page', -1) + for page in cur: + for post in reversed(page['posts']): + yield page['page'], post + +The function ``get_posts`` above will retrieve all the posts on a particular user's +wall or news feed in reverse-chronological order. Some special handling is +required to efficieintly achieve the reverse-chronological ordering: + +- The ``posts`` within a page are actually stored in chronological order, so the + order of these posts must be reversed before displaying. +- As a user pages through her wall, it's preferable to avoid fetching the first + few pages from the server each time. To achieve this, the code above specifies + the first page to fetch in the ``page`` argument, passing this in as an + ``$lte`` expression in the query. +- Rather than only yielding the post itself, the post's page is also yielded from + the generator. This provides the ``page`` argument used in any subsequent calls + to ``get_posts``. + +There is one other issue that needs to be considered in selecting posts for +display: privacy settings. In order to handle privacy issues effectively, you'll +need use some filter functions on the posts generated above by ``get_posts``. The +first of these filters is used to determine whether to show a post when the user +is viewing his or her own wall: + +.. code-block:: python + + def visible_on_own_wall(user, post): + '''if poster is followed by user, post is visible''' + for circle, users in user['circles'].items(): + if post['by']['id'] in users: return True + return False + +In addition to the user's wall, your social network might provide an "incoming" +page that contains all posts directed towards a user regardless of whether that +poster is followed by the user. In this case, you would use a block list +to filter posts: + +.. code-block:: python + + def visible_on_own_incoming(user, post): + '''if poster is not blocked by user, post is visible''' + return post['by']['id'] not in user['blocked'] + +When viewing a news feed or another user's wall, the permission check is a bit +different based on the post's ``circles`` property: + +.. code-block:: python + + def visible_post(user, post): + if post['circles'] == ['*public*']: + # public posts always visible + return True + circles_user_is_in = set( + user['followers'].get(post['by']['id'] [])) + if not circles_user_is_in: + # user is not circled by poster; post is invisible + return False + if post['circles'] == ['*circles*']: + # post is public to all followed users; post is visible + return True + for circle in post['circles']: + if circle in circles_user_is_in: + # User is in a circle receiving this post + return True + return False + +Index Support +````````````` + +In order to quickly retrieve the pages in the desired order, you'll need an index +on (``user_id``, ``page``) in both the ``social.news`` and ``social.wall`` +collections. Since this combination is in fact unique, you should go ahead and +specify ``unique=True`` for the index (this will become important later). + +.. code-block:: pycon + + >>> db.social.news.ensure_index([ + ... ('user_id', 1), + ... ('page', -1)], + ... unique=True) + >>> db.social.wall.ensure_index([ + ... ('user_id', 1), + ... ('page', -1)], + ... unique=True) + + +Creating a New Post +~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + from datetime import datetime + POSTS_PER_PAGE=25 + + def post(user, dest_user, type, detail, circles): + ts = datetime.utcnow() + post = { + 'ts': ts, + 'by': { id: user['id'], name: user['name'] }, + 'circles': circles, + 'type': type, + 'detail': detail, + 'comments': [] } + # Update global post collection + db.social.post.insert(post) + if dest_user in user['followers'] + result = db.social.wall.update( + { 'user_id': user['id'], 'page': user['wall_pages'] } + + +Commenting on a Post +~~~~~~~~~~~~~~~~~~~~ + +Adding a User to a Circle +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Removing a User from a Circle +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Viewing a User's Profile +~~~~~~~~~~~~~~~~~~~~~~~~ + +Another common read operation on social networks is viewing a user's profile, +including their wall posts. The code is actually quite similar to the code for + Operation 1 ~~~~~~~~~~~ From b8bc787299f6936094ab7d80347a634cb395860e Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 13 Apr 2012 16:37:31 -0400 Subject: [PATCH 12/17] Finish draft of social networking doc Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 296 ++++++++++++++---- 1 file changed, 235 insertions(+), 61 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index 54431f9f2cb..03176844a06 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -78,8 +78,7 @@ along with the user's profile data: ... }, ...} }, - blocked: ['gh1...0d'], - pages: { wall: 4, news: 3 } + blocked: ['gh1...0d'] } There are a few things to note about this schema: @@ -97,10 +96,6 @@ There are a few things to note about this schema: appear on the user's wall or news feed. - The particular profile data stored for the user is isolated into the ``profile`` subdocument, allowing you to evolve the schema as necessary without - worrying about introducing bugs into the social graph. -- The ``pages`` property is used to store the number of pages in the - ``social.wall``, and ``social.news`` collections for this - particular user. These will be used below when creating new posts. Of course, to make the network interesting, it's necessary to add various types of posts. These are stored in the ``social.post`` collection: @@ -162,8 +157,7 @@ follows. { _id: ObjectId(...), user_id: "T4Y...AE", - page: 4, - num_posts: 42, + month: '201204', posts: [ { id: ObjectId(...), ts: ISODateTime(...), @@ -171,14 +165,15 @@ follows. circles: [ '*public*' ], type: 'status', detail: { text: 'Loving MongoDB' }, + comments_shown: 3, comments: [ { by: { id: "T4Y...AG", name: 'Dwight', ts: ISODateTime(...), text: 'Right on!' }, - ... only last X comments listed ... + ... only last 3 comments listed ... ] }, - { id: ObjectId(...), + { id: ObjectId(...),s ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, circles: [ '*circles*' ], @@ -188,11 +183,12 @@ follows. geo: [ 40.724348,-73.997308 ], name: '10gen Office', photo: 'http://....' }, + comments_shown: 1, comments: [ { by: { id: "T4Y...AD", name: 'Jared' }, ts: ISODateTime(...), text: 'Wrong coast!' }, - ... only last X comments listed ... + ... only last 1 comment listed ... ] }, { id: ObjectId(...), @@ -202,11 +198,12 @@ follows. type: 'status', detail: { text: 'So when do you crush Oracle?' }, + comments_shown: 2, comments: [ { by: { id: "T4Y...AE", name: 'Max' }, ts: ISODateTime(...), text: 'Soon... ;-)' }, - ... only last X comments listed ... + ... only last 2 comments listed ... ] }, ... @@ -220,12 +217,13 @@ There are a few things to note about this schema: display more comments on a post, you would then query the ``social.post`` collection for full details. - There are actually multiple ``social.wall`` documents for each ``social.user`` - document. This allows the system to keep a "page" of recent posts in the - initial page view, fetching older "pages" if requested. A ``page`` property - keeps track of the position of this page of posts on the user's overall wall - timeline along with the timestamps on individual posts. + document, one wall document per month. This allows the system to keep a "page" of + recent posts in the initial page view, fetching older months if requested. - Once again, the ``by`` properties store only the minimal author information for display, helping to keep this document small. +- The number of comments on each post is stored to allow later updates to find + posts with more than a certain number of comments since the ``$size`` query + operator does not allow inequality comparisons. The other dependent collection you'll use is ``social.news``, posts from people the user follows. This schema includes much of the same information as the @@ -237,8 +235,7 @@ clarity: { _id: ObjectId(...), user_id: "T4Y...AE", - page: 3, - num_posts: 42, + month: '201204', posts: [ ... ] } @@ -265,29 +262,29 @@ similar operations, and can be supported by the same code: .. code-block:: python - def get_posts(collection, user_id, page=None): + def get_posts(collection, user_id, month=None): spec = { 'user_id': viewed_user_id } - if page is not None: - spec['page'] = {'$lte': page} + if month is not None: + spec['month'] = {'$lte': month} cur = collection.find(spec) - cur = cur.sort('page', -1) + cur = cur.sort('month', -1) for page in cur: for post in reversed(page['posts']): - yield page['page'], post + yield page['month'], post The function ``get_posts`` above will retrieve all the posts on a particular user's wall or news feed in reverse-chronological order. Some special handling is required to efficieintly achieve the reverse-chronological ordering: -- The ``posts`` within a page are actually stored in chronological order, so the +- The ``posts`` within a month are actually stored in chronological order, so the order of these posts must be reversed before displaying. - As a user pages through her wall, it's preferable to avoid fetching the first - few pages from the server each time. To achieve this, the code above specifies - the first page to fetch in the ``page`` argument, passing this in as an + few months from the server each time. To achieve this, the code above specifies + the first month to fetch in the ``month`` argument, passing this in as an ``$lte`` expression in the query. -- Rather than only yielding the post itself, the post's page is also yielded from - the generator. This provides the ``page`` argument used in any subsequent calls - to ``get_posts``. +- Rather than only yielding the post itself, the post's month is also yielded from + the generator. This provides the ``month`` argument to be used in any + subsequent calls to ``get_posts``. There is one other issue that needs to be considered in selecting posts for display: privacy settings. In order to handle privacy issues effectively, you'll @@ -341,32 +338,113 @@ Index Support ````````````` In order to quickly retrieve the pages in the desired order, you'll need an index -on (``user_id``, ``page``) in both the ``social.news`` and ``social.wall`` -collections. Since this combination is in fact unique, you should go ahead and -specify ``unique=True`` for the index (this will become important later). +on (``user_id``, ``month``) in both the ``social.news`` and ``social.wall`` +collections. .. code-block:: pycon - >>> db.social.news.ensure_index([ - ... ('user_id', 1), - ... ('page', -1)], - ... unique=True) - >>> db.social.wall.ensure_index([ - ... ('user_id', 1), - ... ('page', -1)], - ... unique=True) + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index([ + ... ('user_id', 1), + ... ('month', -1)]) + +Commenting on a Post +~~~~~~~~~~~~~~~~~~~~ + +Other than viewing walls and news feeds, creating new posts is the next most +common action taken on social networks. To create a comment by ``user`` on a +given ``post`` containing the given ``text``, you'll need to execute code similar +to the following: + +.. code-block:: python + + from datetime import datetime + + def comment(user, post_id, text): + ts = datetime.utcnow() + month = ts.strfime('%Y%m') + comment = { + 'by': { 'id': user['id'], 'name': user['name'] } + 'ts': ts, + 'text': text } + # Update the social.posts collection + db.social.post.update( + { '_id': post_id }, + { '$push': { 'comments': comment } } ) + # Update social.wall and social.news collections + db.social.wall.update( + { 'posts.id': post_id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + db.social.news.update( + { 'posts.id': _id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + +.. note:: + + One thing to note in this function is the presence of a couple of ``multi=True`` + update statements. Since these can potentially take quite a long time, this + function is a good candidate for processing 'out of band' with the regular + request-response flow of your application. + +The code above can actually result in an unbounded number of comments being +inserted into the ``social.wall`` and ``social.news`` collections. To compensate +for this, you should periodically run the following update statement to truncate +the number of displayed comments and keep the size of the news and wall documents +manageable.: + +.. code-block:: python + + COMMENTS_SHOWN = 3 + + def truncate_extra_comments(): + db.social.news.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) + db.social.wall.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) + +Index Support +````````````` +In order to execute the updates to the ``social.news`` and ``social.wall`` +collections show above efficiently, you'll need to be able to quickly locate both +of the following types of documents: + +- Documents containing a given post +- Documents containing posts displaying too many comments + +To quickly execute these updates, then, you'll need to create the following +indexes: +.. code-block:: pycon + + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index('posts.id') + ... collection.ensure_index('posts.comments_shown') Creating a New Post ~~~~~~~~~~~~~~~~~~~ +Creating a new post fills out the content-creation activities on a social +network: + .. code-block:: python from datetime import datetime - POSTS_PER_PAGE=25 def post(user, dest_user, type, detail, circles): ts = datetime.utcnow() + month = ts.strfime('%Y%m') post = { 'ts': ts, 'by': { id: user['id'], name: user['name'] }, @@ -376,43 +454,139 @@ Creating a New Post 'comments': [] } # Update global post collection db.social.post.insert(post) - if dest_user in user['followers'] - result = db.social.wall.update( - { 'user_id': user['id'], 'page': user['wall_pages'] } + # Copy to dest user's wall + if user['id'] not in dest_user['blocked']: + append_post(db.social.wall, [dest_user['id']], month, post) + # Copy to followers' news feeds + if circles == ['*public*']: + dest_userids = set(user['followers'].keys()) + else: + dest_userids = set() + if circles == [ '*circles*' ]: + circles = user['circles'].keys() + for circle in circles: + dest_userids.update(user['circles'][circle]) + append_post(db.social.news, dest_userids, month, post) + +The basic sequence of operations in the code above is the following: + +#. The post first saved into the "system of record," the ``social.post`` + collection. +#. The recipient's wall is updatd with the post. +#. The news feeds of everyone who is 'circled' in the post is updated with the + post. + +Updating a particular wall or group of news feeds is then accomplished using the +``append_post`` function: +.. code-block:: python -Commenting on a Post -~~~~~~~~~~~~~~~~~~~~ + def append_post(collection, dest_userids, month, post): + collection.update( + { 'user_id': { '$in': sorted(dest_userids) }, + 'month': month }, + { '$push': { 'posts': post } }, + multi=True) -Adding a User to a Circle -~~~~~~~~~~~~~~~~~~~~~~~~~ +Index Support +````````````` -Removing a User from a Circle -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In order to quickly update the ``social.wall`` and ``social.news`` collections, +you'll once again need an index on both ``user_id`` and ``month``. This time, +however, the optimal order on the indexes is (``month``, ``user_id``). This is +due to the fact that updates to these collections will always be for the current +month; having month appear first in the index makes the index *right-aligned*, +requiring significantly less memory to store the active part of the index. -Viewing a User's Profile -~~~~~~~~~~~~~~~~~~~~~~~~ +To actually create this index, you'll need to execute the following commands: -Another common read operation on social networks is viewing a user's profile, -including their wall posts. The code is actually quite similar to the code for +.. code-block:: pycon -Operation 1 -~~~~~~~~~~~ + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index([ + ... ('month', 1), + ... ('user_id', 1)]) -TODO: describe what the operation is (optional) -Query -````` +Maintaining the Social Graph +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TODO: describe query +In your social network, maintaining the social graph is an infrequent but +essential operation. The code to add a user ``other`` to the current user +``self``\'s circles, you'll need to run the following function: + +.. code-block:: python + + def circle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$set': { circles_path: { 'name': other['name' ]} } }) + follower_circles = 'followers.%s.circles' % self['_id'] + follower_name = 'followers.%s.name' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$push': { follower_circles: circle }, + '$set': { follower_name: self['name'] } }) + +Note that in this solution, previous posts of the ``other`` user are not added to +the ``self`` user's news feed or wall. To actually include these past posts would +be an expensive and complex operation, and goes beyond the scope of this use case. + +Of course, you'll also need to support *removing* users from circles: + +.. code-block:: python + + def uncircle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$unset': { circles_path: 1 } }) + follower_circles = 'followers.%s.circles' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$pull': { follower_circles: circle } }) + # Special case -- 'other' is completely uncircled + db.social.user.update( + { '_id': other['_id'], follower_circles: {'$size': 0 } }, + { '$unset': { 'followers.' + self['_id' } }}) Index Support ````````````` -TODO: describe indexes to optimize this query +In both the circling and uncircling cases, the ``_id`` is included in the update +queries, so no additional indexes are required. Sharding -------- +In order to scale beyond the capacity of a single replica set, you will need to +shard each of the collections mentioned above. Since the ``social.user``, +``social.wall``, and ``social.news`` collections contain documents which are +specific to a given user, the user's ``_id`` field is an appropriate shard key: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'social.user', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.user", "ok": 1 } + >>> db.command('shardcollection', 'social.wall', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.wall", "ok": 1 } + >>> db.command('shardcollection', 'social.news', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.news", "ok": 1 } + +It turns out that using the posting user's ``_id`` is actually *not* the best +choice for a shard key for ``social.post``. This is due to the fact that queries +and updates to this table are done using the ``_id`` field, and sharding on +``by.id``, while tempting, would require these updates to be *broadcast* to all +shards. To shard the ``social.post`` collection on ``_id``, then, you'll need to +execute the following command: + + >>> db.command('shardcollection', 'social.post', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.post", "ok": 1 } + .. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki page. From f53c0d6dae5ae6388bbcd0d46f7477883b8dba79 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 13 Apr 2012 21:24:47 -0400 Subject: [PATCH 13/17] Make a note indicating that the index on (month, user_id) is extraneous Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index 03176844a06..7d06233f74e 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -498,15 +498,12 @@ due to the fact that updates to these collections will always be for the current month; having month appear first in the index makes the index *right-aligned*, requiring significantly less memory to store the active part of the index. -To actually create this index, you'll need to execute the following commands: - -.. code-block:: pycon - - >>> for collection in (db.social.news, db.social.wall): - ... collection.ensure_index([ - ... ('month', 1), - ... ('user_id', 1)]) - +*However*, in this case, since you have already defined an index on (``user_id``, + ``month``), which *must* be in that order so that you can do the sort on + ``month``, adding a second index is unnecessary, and would end up actually using + more RAM to maintain two indexes. So even though this particular operation would + benefit from having an index on (``month``, ``user_id``), it's best to leave out + any additional indexes here. Maintaining the Social Graph ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 400394d544372e94e2c25114f32e240ff697ed19 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 9 May 2012 09:44:53 -0400 Subject: [PATCH 14/17] Correct misinformation in category-hierarchy, remove old copy from applications/use-cases Signed-off-by: Rick Copeland --- .../ecommerce-category-hierarchy.txt | 247 ------------------ source/use-cases/category-hierarchy.txt | 7 +- 2 files changed, 2 insertions(+), 252 deletions(-) delete mode 100644 source/applications/use-cases/ecommerce-category-hierarchy.txt diff --git a/source/applications/use-cases/ecommerce-category-hierarchy.txt b/source/applications/use-cases/ecommerce-category-hierarchy.txt deleted file mode 100644 index 38c49aa5e8f..00000000000 --- a/source/applications/use-cases/ecommerce-category-hierarchy.txt +++ /dev/null @@ -1,247 +0,0 @@ -============================== -E-Commerce: Category Hierarchy -============================== - -Problem -======= - -You have a product hierarchy for an e-commerce site that you want to -query frequently and update somewhat frequently. - -Solution Overview -================= - -This solution keeps each category in its own document, along with a list of its -ancestors. The category hierarchy used in this example will be -based on different categories of music: - -.. figure:: img/ecommerce-category1.png - :align: center - :alt: Initial category hierarchy - - Initial category hierarchy - -Since categories change relatively infrequently, the focus here will be on the -operations needed to keep the hierarchy up-to-date and less on the performance -aspects of updating the hierarchy. - -Schema Design -============= - -Each category in the hierarchy will be represented by a document. That -document will be identified by an ``ObjectId`` for internal -cross-referencing as well as a human-readable name and a url-friendly -``slug`` property. Additionally, the schema stores an ancestors list along -with each document to facilitate displaying a category along with all -its ancestors in a single query. - -.. code-block:: javascript - - { "_id" : ObjectId("4f5ec858eb03303a11000002"), - "name" : "Modal Jazz", - "parent" : ObjectId("4f5ec858eb03303a11000001"), - "slug" : "modal-jazz", - "ancestors" : [ - { "_id" : ObjectId("4f5ec858eb03303a11000001"), - "slug" : "bop", - "name" : "Bop" }, - { "_id" : ObjectId("4f5ec858eb03303a11000000"), - "slug" : "ragtime", - "name" : "Ragtime" } ] - } - -Operations -========== - -Here, the various category manipulations you may need in an ecommerce site are -described as they would occur using the schema above. The examples use the Python -programming language and the ``pymongo`` MongoDB driver, but implementations -would be similar in other languages as well. - -Read and Display a Category ---------------------------- - -The simplest operation is reading and displaying a hierarchy. In this -case, you might want to display a category along with a list of "bread -crumbs" leading back up the hierarchy. In an E-commerce site, you'll -most likely have the slug of the category available for your query, as it can be -parsed from the URL. - -.. code-block:: python - - category = db.categories.find( - {'slug':slug}, - {'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 }) - -Here, the slug is used to retrieve the category, fetching only those -fields needed for display. - -Index Support -~~~~~~~~~~~~~ - -In order to support this common operation efficiently, you'll need an index -on the 'slug' field. Since slug is also intended to be unique, the index over it -should be unique as well: - -.. code-block:: python - - db.categories.ensure_index('slug', unique=True) - -Add a Category to the Hierarchy -------------------------------- - -Adding a category to a hierarchy is relatively simple. Suppose you wish -to add a new category 'Swing' as a child of 'Ragtime': - -.. figure:: img/ecommerce-category2.png - :align: center - :alt: Adding a category - - Adding a category - -In this case, the initial insert is simple enough, but after this -insert, the "Swing" category is still missing its ancestors array. To define -this, you'll need a helper function to build the ancestor list: - -.. code-block:: python - - def build_ancestors(_id, parent_id): - parent = db.categories.find_one( - {'_id': parent_id}, - {'name': 1, 'slug': 1, 'ancestors':1}) - parent_ancestors = parent.pop('ancestors') - ancestors = [ parent ] + parent_ancestors - db.categories.update( - {'_id': _id}, - {'$set': { 'ancestors': ancestors } }) - -Note that you only need to travel one level in the hierarchy to get the -ragtime's ancestors and build swing's entire ancestor list. Now you can -actually perform the insert and rebuild the ancestor list: - -.. code-block:: python - - doc = dict(name='Swing', slug='swing', parent=ragtime_id) - swing_id = db.categories.insert(doc) - build_ancestors(swing_id, ragtime_id) - -Index Support -~~~~~~~~~~~~~ - -Since these queries and updates all selected based on ``_id``, you only need -the default MongoDB-supplied index on ``_id`` to support this operation -efficiently. - -Change the Ancestry of a Category ---------------------------------- - -Suppose you wish to reorganize the hierarchy by moving 'bop' under -'swing': - -.. figure:: img/ecommerce-category3.png - :align: center - :alt: Change the parent of a category - - Change the parent of a category - -The initial update is straightforward: - -.. code-block:: python - - db.categories.update( - {'_id':bop_id}, {'$set': { 'parent': swing_id } } ) - -Now, you still need to update the ancestor list for bop and all its -descendants. In this case, you can't guarantee that the ancestor list of -the parent category is always correct, since MongoDB may -process the categories out-of-order. To handle this, you'll need a new -ancestor-building function: - -.. code-block:: python - - def build_ancestors_full(_id, parent_id): - ancestors = [] - while parent_id is not None: - parent = db.categories.find_one( - {'_id': parent_id}, - {'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1}) - parent_id = parent.pop('parent') - ancestors.append(parent) - db.categories.update( - {'_id': _id}, - {'$set': { 'ancestors': ancestors } }) - -Now, at the expense of a few more queries up the hierarchy, you can -easily reconstruct all the descendants of 'bop': - -.. code-block:: python - - for cat in db.categories.find( - {'ancestors._id': bop_id}, - {'parent_id': 1}): - build_ancestors_full(cat['_id'], cat['parent_id']) - -Index Support -~~~~~~~~~~~~~ - -In this case, an index on ``ancestors._id`` would be helpful in -determining which descendants need to be updated: - -.. code-block:: python - - db.categories.ensure_index('ancestors._id') - -Rename a Category ------------------ - -Renaming a category would normally be an extremely quick operation, but -in this case due to denormalization, you also need to update the -descendants. Suppose you need to rename "Bop" to "BeBop:" - -.. figure:: img/ecommerce-category4.png - :align: center - :alt: Rename a category - - Rename a category - -First, you need to update the category name itself: - -.. code-block:: python - - db.categories.update( - {'_id':bop_id}, {'$set': { 'name': 'BeBop' } } ) - -Next, you need to update each descendant's ancestors list: - -.. code-block:: python - - db.categories.update( - {'ancestors._id': bop_id}, - {'$set': { 'ancestors.$.name': 'BeBop' } }, - multi=True) - -Here, you can use the positional operation ``$`` to match the exact "ancestor" -entry that matches the query, as well as the ``multi`` option on the -update to ensure the rename operation occurs in a single server -round-trip. - -Index Support -~~~~~~~~~~~~~ - -In this case, the index you have already defined on ``ancestors._id`` is -sufficient to ensure good performance. - -Sharding -======== - -In this solution, it is unlikely that you would want to shard the -collection since it's likely to be quite small. If you *should* decide to -shard, the use of an ``_id`` field for most updates makes it an -ideal sharding candidate. The sharding commands you'd use to shard -the category collection would then be the following: - -.. code-block:: python - - >>> db.command('shardcollection', 'categories', { - ... key: {'_id': 1} }) - { "collectionsharded" : "categories", "ok" : 1 } diff --git a/source/use-cases/category-hierarchy.txt b/source/use-cases/category-hierarchy.txt index 084715938cd..1d31218e9f9 100644 --- a/source/use-cases/category-hierarchy.txt +++ b/source/use-cases/category-hierarchy.txt @@ -262,10 +262,7 @@ following operation in the Python/PyMongo console. .. code-block:: python - >>> db.command('shardcollection', 'categories') + >>> db.command('shardcollection', 'categories', { + ... 'key': {'_id': 1} }) { "collectionsharded" : "categories", "ok" : 1 } -.. note:: - - There is no need to specify the shard key, as MongoDB will use the - ``_id`` field as :term:`shard key` by default. From 937271860225d171256ea9703f097fd95b726e8d Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 9 May 2012 10:01:39 -0400 Subject: [PATCH 15/17] Remove old copy of inventory-management, correct some extraneous uses of safe=True, remove incorrect assertion that mongodb will shard on _id by default, and reunify the python console session at the end Signed-off-by: Rick Copeland --- .../ecommerce-inventory-management.txt | 408 ------------------ source/use-cases/inventory-management.txt | 31 +- 2 files changed, 13 insertions(+), 426 deletions(-) delete mode 100644 source/applications/use-cases/ecommerce-inventory-management.txt diff --git a/source/applications/use-cases/ecommerce-inventory-management.txt b/source/applications/use-cases/ecommerce-inventory-management.txt deleted file mode 100644 index e78b070c87e..00000000000 --- a/source/applications/use-cases/ecommerce-inventory-management.txt +++ /dev/null @@ -1,408 +0,0 @@ -================================ -E-Commerce: Inventory Management -================================ - -.. default-domain:: mongodb - -Overview --------- - -Problem -~~~~~~~ - -You have a product catalog and you would like to maintain an accurate -inventory count as users shop your online store, adding and removing -things from their cart. - -Solution -~~~~~~~~ - -In an ideal world, consumers would begin browsing an online store, add -items to their shopping cart, and proceed in a timely manner to checkout -where their credit cards would always be successfully validated and -charged. In the real world, however, customers often add or remove items -from their shopping cart, change quantities, abandon the cart, and have -problems at checkout time. - -This solution keeps the traditional metaphor of the shopping cart, but -the shopping cart will *age* . Once a shopping cart has not been active -for a certain period of time, all the items in the cart once again -become part of available inventory and the cart is cleared. The state -transition diagram for a shopping cart is below: - -.. figure:: img/ecommerce-inventory1.png - :align: center - :alt: - -Schema -~~~~~~ - -In your inventory collection, you need to maintain the current available -inventory of each stock-keeping unit (SKU) as well as a list of 'carted' -items that may be released back to available inventory if their shopping -cart times out: - -.. code-block:: javascript - - { - _id: '00e8da9b', - qty: 16, - carted: [ - { qty: 1, cart_id: 42, - timestamp: ISODate("2012-03-09T20:55:36Z"), }, - { qty: 2, cart_id: 43, - timestamp: ISODate("2012-03-09T21:55:36Z"), }, - ] - } - -(Note that, while in an actual implementation, you might choose to merge -this schema with the product catalog schema described in -:doc:`E-Commerce: Product Catalog `, the inventory -schema is simplified here for brevity.) Continuing the metaphor of the -brick-and-mortar store, your SKU above has 16 items on the shelf, 1 in one cart, -and 2 in another for a total of 19 unsold items of merchandise. - -For the shopping cart model, you need to maintain a list of (``sku``, -``quantity``, ``price``) line items: - -.. code-block:: javascript - - { - _id: 42, - last_modified: ISODate("2012-03-09T20:55:36Z"), - status: 'active', - items: [ - { sku: '00e8da9b', qty: 1, item_details: {...} }, - { sku: '0ab42f88', qty: 4, item_details: {...} } - ] - } - -Note that the cart model includes item details in each line -item. This allows your app to display the contents of the cart to the user -without needing a second query back to the catalog collection to fetch -the details. - -Operations ----------- - -Here, the various inventory-related operations in an ecommerce site are described -as they would occur using the schema above. The examples use the Python -programming language and the ``pymongo`` MongoDB driver, but implementations -would be similar in other languages as well. - -Add an Item to a Shopping Cart -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Updating -```````` - -The most basic operation is moving an item off the "shelf" in to the -"cart." The constraint is that you would like to guarantee that you never -move an unavailable item off the shelf into the cart. To solve this -problem, this solution ensures that inventory is only updated if there is -sufficient inventory to satisfy the request: - -.. code-block:: python - - def add_item_to_cart(cart_id, sku, qty, details): - now = datetime.utcnow() - - # Make sure the cart is still active and add the line item - result = db.cart.update( - {'_id': cart_id, 'status': 'active' }, - { '$set': { 'last_modified': now }, - '$push': - 'items': {'sku': sku, 'qty':qty, 'details': details } - }, - safe=True) - if not result['updatedExisting']: - raise CartInactive() - - # Update the inventory - result = db.inventory.update( - {'_id':sku, 'qty': {'$gte': qty}}, - {'$inc': {'qty': -qty}, - '$push': { - 'carted': { 'qty': qty, 'cart_id':cart_id, - 'timestamp': now } } }, - safe=True) - if not result['updatedExisting']: - # Roll back our cart update - db.cart.update( - {'_id': cart_id }, - { '$pull': { 'items': {'sku': sku } } } - ) - raise InadequateInventory() - -Note here in particular that the system does not trust that the request is -satisfiable. The first check makes sure that the cart is still "active" -(more on inactive carts below) before adding a line item. The next check -verifies that sufficient inventory exists to satisfy the request before -decrementing inventory. In the case of inadequate inventory, the system -*compensates* for the non-transactional nature of MongoDB by removing the -cart update. Using safe=True and checking the result in the case of -these two updates allows you to report back an error to the user if the -cart has become inactive or available quantity is insufficient to -satisfy the request. - -Indexing -```````` - -To support this query efficiently, all you really need is an index on -``_id``, which MongoDB provides us by default. - -Modifying the Quantity in the Cart -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Process -``````` - -Here, you'd like to allow the user to adjust the quantity of items in their -cart. The system must ensure that when they adjust the quantity upward, there -is sufficient inventory to cover the quantity, as well as updating the -particular ``carted`` entry for the user's cart. - -.. code-block:: python - - def update_quantity(cart_id, sku, old_qty, new_qty): - now = datetime.utcnow() - delta_qty = new_qty - old_qty - - # Make sure the cart is still active and add the line item - result = db.cart.update( - {'_id': cart_id, 'status': 'active', 'items.sku': sku }, - {'$set': { - 'last_modified': now, - 'items.$.qty': new_qty }, - }, - safe=True) - if not result['updatedExisting']: - raise CartInactive() - - # Update the inventory - result = db.inventory.update( - {'_id':sku, - 'carted.cart_id': cart_id, - 'qty': {'$gte': delta_qty} }, - {'$inc': {'qty': -delta_qty }, - '$set': { 'carted.$.qty': new_qty, 'timestamp': now } }, - safe=True) - if not result['updatedExisting']: - # Roll back our cart update - db.cart.update( - {'_id': cart_id, 'items.sku': sku }, - {'$set': { 'items.$.qty': old_qty } - }) - raise InadequateInventory() - -Note in particular here the use of the positional operator '$' to -update the particular ``carted`` entry and line item that matched for the -query. This allows the system to update the inventory and keep track of the data -necessary need to "rollback" the cart in a single atomic operation. The code above -also ensures the cart is active and timestamp it as in the case of adding -items to the cart. - -Indexing -```````` - -To support this query efficiently, again all we really need is an index on ``_id``. - -Checking Out -~~~~~~~~~~~~ - -Process -``````` - -During checkout, you'd like to validate the method of payment and remove -the various ``carted`` items after the transaction has succeeded. - -.. code-block:: python - - def checkout(cart_id): - now = datetime.utcnow() - - # Make sure the cart is still active and set to 'pending'. Also - # fetch the cart details so we can calculate the checkout price - cart = db.cart.find_and_modify( - {'_id': cart_id, 'status': 'active' }, - update={'$set': { 'status': 'pending','last_modified': now } } ) - if cart is None: - raise CartInactive() - - # Validate payment details; collect payment - try: - collect_payment(cart) - db.cart.update( - {'_id': cart_id }, - {'$set': { 'status': 'complete' } } ) - db.inventory.update( - {'carted.cart_id': cart_id}, - {'$pull': {'cart_id': cart_id} }, - multi=True) - except: - db.cart.update( - {'_id': cart_id }, - {'$set': { 'status': 'active' } } ) - raise - -Here, the cart is first "locked" by setting its status to "pending" -(disabling any modifications.) Then the system collects payment data, verifying -at the same time that the cart is still active. MongoDB's -``findAndModify`` command is used to atomically update the cart and return its -details so you can capture payment information. If the payment is -successful, you then remove the ``carted`` items from individual items' -inventory and set the cart to "complete." If payment is unsuccessful, you -unlock the cart by setting its status back to "active" and report a -payment error. - -Indexing -```````` - -Once again the ``_id`` default index is enough to make this operation efficient. - -Returning Timed-Out Items to Inventory -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Process -``````` - -Periodically, you'd like to expire carts that have been inactive for a -given number of seconds, returning their line items to available -inventory: - -.. code-block:: python - - def expire_carts(timeout): - now = datetime.utcnow() - threshold = now - timedelta(seconds=timeout) - - # Lock and find all the expiring carts - db.cart.update( - {'status': 'active', 'last_modified': { '$lt': threshold } }, - {'$set': { 'status': 'expiring' } }, - multi=True ) - - # Actually expire each cart - for cart in db.cart.find({'status': 'expiring'}): - - # Return all line items to inventory - for item in cart['items']: - db.inventory.update( - { '_id': item['sku'], - 'carted.cart_id': cart['id'], - 'carted.qty': item['qty'] - }, - {'$inc': { 'qty': item['qty'] }, - '$pull': { 'carted': { 'cart_id': cart['id'] } } }) - - db.cart.update( - {'_id': cart['id'] }, - {'$set': { status': 'expired' }) - -Here, you first find all carts to be expired and then, for each cart, -return its items to inventory. Once all items have been returned to -inventory, the cart is moved to the 'expired' state. - -Indexing -```````` - -In this case, you need to be able to efficiently query carts based on -their ``status`` and ``last_modified`` values, so an index on these would help -the performance of the periodic expiration process: - -.. code-block:: python - - db.cart.ensure_index([('status', 1), ('last_modified', 1)]) - -Note in particular the order in which the index is defined: in order to -efficiently support range queries ('$lt' in this case), the ranged item -must be the last item in the index. Also note that there is no need to -define an index on the ``status`` field alone, as any queries for status -can use the compound index we have defined here. - -Error Handling -~~~~~~~~~~~~~~ - -There is one failure mode above that thusfar has not been handled adequately: the -case of an exception that occurs after updating the inventory collection -but before updating the shopping cart. The result of this failure mode -is a shopping cart that may be absent or expired where the 'carted' -items in the inventory have not been returned to available inventory. To -account for this case, you'll need to run a cleanup method periodically that -will find old ``carted`` items and check the status of their cart: - -.. code-block:: python - - def cleanup_inventory(timeout): - now = datetime.utcnow() - threshold = now - timedelta(seconds=timeout) - - # Find all the expiring carted items - for item in db.inventory.find( - {'carted.timestamp': {'$lt': threshold }}): - - # Find all the carted items that matched - carted = dict( - (carted_item['cart_id'], carted_item) - for carted_item in item['carted'] - if carted_item['timestamp'] < threshold) - - # Find any carts that are active and refresh the carted items - for cart in db.cart.find( - { '_id': {'$in': carted.keys() }, - 'status':'active'}): - cart = carted[cart['_id']] - db.inventory.update( - { '_id': item['_id'], - 'carted.cart_id': cart['_id'] }, - { '$set': {'carted.$.timestamp': now } }) - del carted[cart['_id']] - - # All the carted items left in the dict need to now be - # returned to inventory - for cart_id, carted_item in carted.items(): - db.inventory.update( - { '_id': item['_id'], - 'carted.cart_id': cart_id, - 'carted.qty': carted_item['qty'] }, - { '$inc': { 'qty': carted_item['qty'] }, - '$pull': { 'carted': { 'cart_id': cart_id } } }) - -Note that the function above is safe, as it checks to be sure the cart -is expired or expiring before removing items from the cart and returning -them to inventory. This function could, however, be slow as well as -slowing down other updates and queries, so it should be used -infrequently. - -Sharding --------- - -If you choose to shard this system, the use of an ``_id`` field for most of -our updates makes ``_id`` an ideal sharding candidate, for both carts and -products. Using ``_id`` as your shard key allows all updates that query on -``_id`` to be routed to a single mongod process. There are two potential -drawbacks with using ``_id`` as a shard key, however. - -- If the cart collection's ``_id`` is generated in a generally increasing - order, new carts will all initially be assigned to a single shard. -- Cart expiration and inventory adjustment requires several broadcast - queries and updates if ``_id`` is used as a shard key. - -It turns out you can mitigate the first pitfall by choosing a random -value (perhaps the sha-1 hash of an ``ObjectId``) as the ``_id`` of each cart -as it is created. The second objection is valid, but relatively -unimportant, as the expiration function runs relatively infrequently and can be -slowed down by the judicious use of ``sleep()`` calls in order to -minimize server load. - -The sharding commands you'd use to shard the cart and inventory -collections, then, would be the following: - -.. code-block:: pycon - - >>> db.command('shardcollection', 'inventory', { - ... 'key': {'_id': 1} }) - { "collectionsharded" : "inventory", "ok" : 1 } - >>> db.command('shardcollection', 'cart', { - ... 'key': {'_id': 1} }) - { "collectionsharded" : "cart", "ok" : 1 } diff --git a/source/use-cases/inventory-management.txt b/source/use-cases/inventory-management.txt index 555bef435bf..219d55dceff 100644 --- a/source/use-cases/inventory-management.txt +++ b/source/use-cases/inventory-management.txt @@ -139,8 +139,7 @@ function operation. # Roll back our cart update db.cart.update( {'_id': cart_id }, - { '$pull': { 'items': {'sku': sku } } }, - safe=True) + { '$pull': { 'items': {'sku': sku } } }) raise InadequateInventory() .. admonition:: The system does not trust that the available inventory can satisfy a request @@ -196,8 +195,7 @@ the user's cart, that the inventory exists to cover the modification. # Roll back our cart update db.cart.update( {'_id': cart_id, 'items.sku': sku }, - {'$set': { 'items.$.qty': old_qty } }, - safe=True) + {'$set': { 'items.$.qty': old_qty } }) raise InadequateInventory() .. note:: @@ -248,7 +246,7 @@ following procedure: db.cart.update( {'_id': cart_id }, {'$set': { 'status': 'active' } } ) - raise PaymentError() + raise Begin by "locking" the cart by setting its status to "pending" Then the system will verify that the cart is still active and collect @@ -362,7 +360,8 @@ return them to available inventory if they do not. threshold = now - timedelta(seconds=timeout) # Find all the expiring carted items - for item in db.inventory.find({'carted.timestamp': {'$lt': threshold }}): + for item in db.inventory.find( + {'carted.timestamp': {'$lt': threshold }}): # Find all the carted items that matched carted = dict( @@ -371,7 +370,9 @@ return them to available inventory if they do not. if carted_item['timestamp'] < threshold) # First Pass: Find any carts that are active and refresh the carted items - for cart in db.cart.find({ '_id': {'$in': carted.keys() }, 'status':'active'}): + for cart in db.cart.find( + { '_id': {'$in': carted.keys() }, + 'status':'active'}): cart = carted[cart['_id']] db.inventory.update( @@ -453,17 +454,11 @@ There are two drawbacks for using ``_id`` as a shard key: Use the following commands in the Python/PyMongo console to shard the cart and inventory collections: -.. code-block:: python - - db.command('shardcollection', 'inventory') - db.command('shardcollection', 'cart') - -.. code-block:: javascript +.. code-block:: pycon + >>> db.command('shardcollection', 'inventory' + ... 'key': { '_id': 1 } ) { "collectionsharded" : "inventory", "ok" : 1 } + >>> db.command('shardcollection', 'cart') + ... 'key': { '_id': 1 } ) { "collectionsharded" : "cart", "ok" : 1 } - -.. note:: - - There is no need to specify the shard key in these operations, - because MongoDB will use the ``_id`` by default. From 1d6452ef7ba56612336de44fefde957a033f0434 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 9 May 2012 10:08:59 -0400 Subject: [PATCH 16/17] Remove incorrect statement about automatically sharding on _id, fix console sessions, fix typo Signed-off-by: Rick Copeland --- .../cms-metadata-and-asset-management.txt | 454 ------------------ .../metadata-and-asset-management.txt | 12 +- 2 files changed, 7 insertions(+), 459 deletions(-) delete mode 100644 source/applications/use-cases/cms-metadata-and-asset-management.txt diff --git a/source/applications/use-cases/cms-metadata-and-asset-management.txt b/source/applications/use-cases/cms-metadata-and-asset-management.txt deleted file mode 100644 index c2693f8e7be..00000000000 --- a/source/applications/use-cases/cms-metadata-and-asset-management.txt +++ /dev/null @@ -1,454 +0,0 @@ -================================== -CMS: Metadata and Asset Management -================================== - -Problem -======= - -You are designing a content management system (CMS) and you want to use -MongoDB to store the content of your sites. - -Solution Overview -================= - -The approach in this solution is inspired by the design of Drupal, an -open source CMS written in PHP on relational databases that is available -at `http://www.drupal.org `_. In this case, you -will take advantage of MongoDB's dynamically typed collections to -*polymorphically* store all your content nodes in the same collection. -Navigational information will be stored in its own collection since -it has relatively little in common with the content nodes, and is not covered in -this use case. - -The main node types which are covered here are: - -Basic page - Basic pages are useful for displaying - infrequently-changing text such as an 'about' page. With a basic - page, the salient information is the title and the - content. -Blog entry - Blog entries record a "stream" of posts from users - on the CMS and store title, author, content, and date as relevant - information. -Photo - Photos participate in photo galleries, and store title, - description, author, and date along with the actual photo binary - data. - -Schema Design -============= - -The node collection contains documents of various formats, but they -will all share a similar structure, with each document including an -``_id``, ``type``, ``section``, ``slug``, ``title``, ``created`` date, -``author``, and ``tags``. The -``section`` property is used to identify groupings of items (grouped to a -particular blog or photo gallery, for instance). The ``slug`` property is -a url-friendly representation of the node that is unique within its -section, and is used for mapping URLs to nodes. Each document also -contains a ``detail`` field which will vary per document type: - -.. code-block:: javascript - - { - _id: ObjectId(…), - nonce: ObjectId(…), - metadata: { - type: 'basic-page' - section: 'my-photos', - slug: 'about', - title: 'About Us', - created: ISODate(...), - author: { _id: ObjectId(…), name: 'Rick' }, - tags: [ ... ], - detail: { text: '# About Us\n…' } - } - } - -For the basic page above, the detail field might simply contain the text -of the page. In the case of a blog entry, the document might resemble -the following instead: - -.. code-block:: javascript - - { - … - metadata: { - … - type: 'blog-entry', - section: 'my-blog', - slug: '2012-03-noticed-the-news', - … - detail: { - publish_on: ISODate(…), - text: 'I noticed the news from Washington today…' - } - } - } - -Photos present something of a special case. Since you'll need to store -potentially very large photos, it's nice to be able to separate the binary -storage of photo data from the metadata storage. GridFS provides just such a -mechanism, splitting a "filesystem" of potentially very large "files" into -two collections, the ``files`` collection and the ``chunks`` collection. In -this case, the two collections will be called ``cms.assets.files`` and -``cms.assets.chunks``. Documents in the ``cms.assets.files`` -collection will be used to store the normal GridFS metadata as well as CMS node -metadata: - -.. code-block:: javascript - - { - _id: ObjectId(…), - length: 123..., - chunkSize: 262144, - uploadDate: ISODate(…), - contentType: 'image/jpeg', - md5: 'ba49a...', - metadata: { - nonce: ObjectId(…), - slug: '2012-03-invisible-bicycle', - type: 'photo', - section: 'my-album', - title: 'Kitteh', - created: ISODate(…), - author: { _id: ObjectId(…), name: 'Jared' }, - tags: [ … ], - detail: { - filename: 'kitteh_invisible_bike.jpg', - resolution: [ 1600, 1600 ], … } - } - } - -NOte that the "normal" node schema is embedded here in the photo schema, allowing -the use of the same code to manipulate nodes of all types. - -Operations -========== - -Here, some common queries and updates that you might need for your CMS are -described, paying particular attention to any "tweaks" necessary for the various -node types. The examples use the Python -programming language and the ``pymongo`` MongoDB driver, but implementations -would be similar in other languages as well. - -Create and Edit Content Nodes ------------------------------ - -The content producers using your CMS will be creating and editing content -most of the time. Most content-creation activities are relatively -straightforward: - -.. code-block:: python - - db.cms.nodes.insert({ - 'nonce': ObjectId(), - 'metadata': { - 'section': 'myblog', - 'slug': '2012-03-noticed-the-news', - 'type': 'blog-entry', - 'title': 'Noticed in the News', - 'created': datetime.utcnow(), - 'author': { 'id': user_id, 'name': 'Rick' }, - 'tags': [ 'news', 'musings' ], - 'detail': { - 'publish_on': datetime.utcnow(), - 'text': 'I noticed the news from Washington today…' } - } - }) - -Once the node is in the database, there is a potential problem with -multiple editors. In order to support this, the schema uses the special ``nonce`` -value to detect when another editor may have modified the document and -allow the application to resolve any conflicts: - -.. code-block:: python - - def update_text(section, slug, nonce, text): - result = db.cms.nodes.update( - { 'metadata.section': section, - 'metadata.slug': slug, - 'nonce': nonce }, - { '$set':{'metadata.detail.text': text, 'nonce': ObjectId() } }, - safe=True) - if not result['updatedExisting']: - raise ConflictError() - -You might also want to perform metadata edits to the item such as adding -tags: - -.. code-block:: python - - db.cms.nodes.update( - { 'metadata.section': section, 'metadata.slug': slug }, - { '$addToSet': { 'tags': { '$each': [ 'interesting', 'funny' ] } } }) - -In this case, you don't actually need to supply the nonce (nor update it) -since you're using the atomic ``$addToSet`` modifier in MongoDB. - -Index Support -~~~~~~~~~~~~~ - -Updates in this case are based on equality queries containing the -(``section``, ``slug``, and ``nonce``) values. To support these queries, you -*might* use the following index: - -.. code-block:: python - - >>> db.cms.nodes.ensure_index([ - ... ('metadata.section', 1), ('metadata.slug', 1), ('nonce', 1) ]) - -Also note, however, that you'd like to ensure that two editors don't -create two documents with the same section and slug. To support this, you need a -second index with a unique constraint: - -.. code-block:: python - - >>> db.cms.nodes.ensure_index([ - ... ('metadata.section', 1), ('metadata.slug', 1)], unique=True) - -In fact, since the expectation is that most of the time (``section``, ``slug``, -``nonce``) is going to be unique, you don't actually get much benefit from the -first index and can use only the second one to satisfy the update queries as -well. - -Upload a Photo --------------- - -Uploading photos shares some things in common with node -update, but it also has some extra nuances: - -.. code-block:: python - - def upload_new_photo( - input_file, section, slug, title, author, tags, details): - fs = GridFS(db, 'cms.assets') - with fs.new_file( - content_type='image/jpeg', - metadata=dict( - type='photo', - locked=datetime.utcnow(), - section=section, - slug=slug, - title=title, - created=datetime.utcnow(), - author=author, - tags=tags, - detail=detail)) as upload_file: - while True: - chunk = input_file.read(upload_file.chunk_size) - if not chunk: break - upload_file.write(chunk) - # unlock the file - db.assets.files.update( - {'_id': upload_file._id}, - {'$set': { 'locked': None } } ) - -Here, since uploading the photo is a non-atomic operation, you need to -"lock" the file during upload by writing the current datetime into the -record. This lets the application detect when a file upload may be stalled, which -is helpful when working with multiple editors. This solution assumes that, for -photo upload, the last update wins: - -.. code-block:: python - - def update_photo_content(input_file, section, slug): - fs = GridFS(db, 'cms.assets') - - # Delete the old version if it's unlocked or was locked more than 5 - # minutes ago - file_obj = db.cms.assets.find_one( - { 'metadata.section': section, - 'metadata.slug': slug, - 'metadata.locked': None }) - if file_obj is None: - threshold = datetime.utcnow() - timedelta(seconds=300) - file_obj = db.cms.assets.find_one( - { 'metadata.section': section, - 'metadata.slug': slug, - 'metadata.locked': { '$lt': threshold } }) - if file_obj is None: raise FileDoesNotExist() - fs.delete(file_obj['_id']) - - # update content, keep metadata unchanged - file_obj['locked'] = datetime.utcnow() - with fs.new_file(**file_obj): - while True: - chunk = input_file.read(upload_file.chunk_size) - if not chunk: break - upload_file.write(chunk) - # unlock the file - db.assets.files.update( - {'_id': upload_file._id}, - {'$set': { 'locked': None } } ) - -You can, of course, perform metadata edits to the item such as adding -tags without the extra complexity: - -.. code-block:: python - - db.cms.assets.files.update( - { 'metadata.section': section, 'metadata.slug': slug }, - { '$addToSet': { - 'metadata.tags': { '$each': [ 'interesting', 'funny' ] } } }) - -Index Support -~~~~~~~~~~~~~ - -Updates here are also based on equality queries containing the -(``section``, ``slug``) values, so you can use the same types of indexes as were -used in the "regular" node case. Note in particular that you need a -unique constraint on (``section``, ``slug``) to ensure that one of the calls to -``GridFS.new_file()`` will fail if multiple editors try to create or update -the file simultaneously. - -.. code-block:: python - - >>> db.cms.assets.files.ensure_index([ - ... ('metadata.section', 1), ('metadata.slug', 1)], unique=True) - -Locate and Render a Node ------------------------- - -You need to be able to locate a node based on its section and slug, which -have been extracted from the page definition and URL by some -other technology. - -.. code-block:: python - - node = db.nodes.find_one( - {'metadata.section': section, 'metadata.slug': slug }) - -Index Support -~~~~~~~~~~~~~ - -The same indexes defined above on (``section``, ``slug``) would -efficiently render this node. - -Locate and Render a Photo -------------------------- - -You want to locate an image based on its section and slug, -which have been extracted from the page definition and URL -just as with other nodes. - -.. code-block:: python - - fs = GridFS(db, 'cms.assets') - with fs.get_version( - **{'metadata.section': section, 'metadata.slug': slug }) as img_fp: - # do something with the image file - -Index Support -~~~~~~~~~~~~~ - -The same indexes defined above on (``section``, ``slug``) would also -efficiently render this image. - -Search for Nodes by Tag ------------------------ - -You'd like to retrieve a list of nodes based on their tags: - -.. code-block:: python - - nodes = db.nodes.find({'metadata.tags': tag }) - -Index Support -~~~~~~~~~~~~~ - -To support searching efficiently, you should define indexes on any fields -you intend on using in your query: - -.. code-block:: python - - >>> db.cms.nodes.ensure_index('tags') - -Search for Images by Tag ------------------------- - -Here, you'd like to retrieve a list of images based on their tags: - -.. code-block:: python - - image_file_objects = db.cms.assets.files.find({'metadata.tags': tag }) - fs = GridFS(db, 'cms.assets') - for image_file_object in db.cms.assets.files.find( - {'metadata.tags': tag }): - image_file = fs.get(image_file_object['_id']) - # do something with the image file - -Index Support -~~~~~~~~~~~~~ - -As above, in order to support searching efficiently, you should define -indexes on any fields you expect to use in the query: - -.. code-block:: python - - >>> db.cms.assets.files.ensure_index('tags') - -Generate a Feed of Recently Published Blog Articles ---------------------------------------------------- - -Here, you need to generate an .rss or .atom feed for your recently -published blog articles, sorted by date descending: - -.. code-block:: python - - articles = db.nodes.find({ - 'metadata.section': 'my-blog' - 'metadata.published': { '$lt': datetime.utcnow() } }) - articles = articles.sort({'metadata.published': -1}) - -Index Support -~~~~~~~~~~~~~ - -In order to support this operation, you'll need to create an index on (``section``, -``published``) so the items are 'in order' for the query. Note that in cases -where you're sorting or using range queries, as here, the field on which -you're sorting or performing a range query must be the final field in the -index: - -.. code-block:: python - - >>> db.cms.nodes.ensure_index( - ... [ ('metadata.section', 1), ('metadata.published', -1) ]) - -Sharding -======== - -In a CMS system, read performance is generally much more important -than write performance. As such, you'll want to optimize the sharding setup -for read performance. In order to achieve the best read performance, you -need to ensure that queries are *routeable* by the mongos process. A -second consideration when sharding is that unique indexes do not span -shards. As such, the shard key must include the unique indexes in order to get -the same semantics as described above. Given -these constraints, sharding the nodes and assets on (``section``, ``slug``) -is a reasonable approach: - -.. code-block:: python - - >>> db.command('shardcollection', 'cms.nodes', { - ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) - { "collectionsharded" : "cms.nodes", "ok" : 1 } - >>> db.command('shardcollection', 'cms.assets.files', { - ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) - { "collectionsharded" : "cms.assets.files", "ok" : 1 } - -If you wish to shard the ``cms.assets.chunks`` collection, you need to shard -on the ``_id`` field (none of the node metadata is available on the -``cms.assets.chunks`` collection in GridFS:) - -.. code-block:: python - - >>> db.command('shardcollection', 'cms.assets.chunks', { - ... key: { '_id': 1 } }) - { "collectionsharded" : "cms.assets.chunks", "ok" : 1 } - -This actually still maintains the query-routability constraint, since -all reads from GridFS must first look up the document in ``cms.assets.files`` and -then look up the chunks separately (though the GridFS API sometimes -hides this detail.) diff --git a/source/use-cases/metadata-and-asset-management.txt b/source/use-cases/metadata-and-asset-management.txt index 9495a2da4d8..2ba775bc0cc 100644 --- a/source/use-cases/metadata-and-asset-management.txt +++ b/source/use-cases/metadata-and-asset-management.txt @@ -349,7 +349,7 @@ To locate an image based on the value of ``metadata.section`` and .. code-block:: python - python = GridFS(db, 'cms.assets') + fs = GridFS(db, 'cms.assets') with fs.get_version({'metadata.section': section, 'metadata.slug': slug }) as img_fpo: # do something with the image file @@ -472,8 +472,10 @@ Use the following operation at the Python/PyMongo shell: >>> db.command('shardcollection', 'cms.nodes', { ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) + { "collectionsharded": "cms.nodes", "ok": 1} >>> db.command('shardcollection', 'cms.assets.files', { ... key : { 'metadata.section': 1, 'metadata.slug' : 1 } }) + { "collectionsharded": "cms.assets.files", "ok": 1} To shard the ``cms.assets.chunks`` collection, you must use the ``_id`` field as the :term:`shard key`. The following operation will @@ -481,10 +483,10 @@ shard the collection .. code-block:: pycon - >>> db.command('shardcollection', 'cms.assets.chunks') + >>> db.command('shardcollection', 'cms.assets.chunks', { + ... key : { '_id': 1 } }) + { "collectionsharded": "cms.assets.chunks", "ok": 1} -If you do not specific a shard key, when using the -:dbcommand:`shardcollection` command, MongoDB will shard based on the -``_id`` field. This also ensures routable queries because all reads +Note that sharding on the ``_id`` field ensures routable queries because all reads from GridFS must first look up the document in ``cms.assets.files`` and then look up the chunks separately. From ba4f443ecc105f542ce8d7a66394f354eee80df5 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 9 May 2012 10:10:59 -0400 Subject: [PATCH 17/17] Add newest batch of use cases to draft/use-cases Signed-off-by: Rick Copeland --- .../use-cases/ad-campaign-management.txt | 0 .../use-cases/ad-serving-ads.txt | 0 .../use-cases/gaming-user-state.txt | 0 .../use-cases/social-user-profile.txt | 0 source/applications/use-cases/index.txt | 60 ------------------- .../use-cases/use-case-template.txt | 60 ------------------- 6 files changed, 120 deletions(-) rename {source/applications => draft}/use-cases/ad-campaign-management.txt (100%) rename {source/applications => draft}/use-cases/ad-serving-ads.txt (100%) rename {source/applications => draft}/use-cases/gaming-user-state.txt (100%) rename {source/applications => draft}/use-cases/social-user-profile.txt (100%) delete mode 100644 source/applications/use-cases/index.txt delete mode 100644 source/applications/use-cases/use-case-template.txt diff --git a/source/applications/use-cases/ad-campaign-management.txt b/draft/use-cases/ad-campaign-management.txt similarity index 100% rename from source/applications/use-cases/ad-campaign-management.txt rename to draft/use-cases/ad-campaign-management.txt diff --git a/source/applications/use-cases/ad-serving-ads.txt b/draft/use-cases/ad-serving-ads.txt similarity index 100% rename from source/applications/use-cases/ad-serving-ads.txt rename to draft/use-cases/ad-serving-ads.txt diff --git a/source/applications/use-cases/gaming-user-state.txt b/draft/use-cases/gaming-user-state.txt similarity index 100% rename from source/applications/use-cases/gaming-user-state.txt rename to draft/use-cases/gaming-user-state.txt diff --git a/source/applications/use-cases/social-user-profile.txt b/draft/use-cases/social-user-profile.txt similarity index 100% rename from source/applications/use-cases/social-user-profile.txt rename to draft/use-cases/social-user-profile.txt diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt deleted file mode 100644 index 53c4f96c0b6..00000000000 --- a/source/applications/use-cases/index.txt +++ /dev/null @@ -1,60 +0,0 @@ -:orphan: - -========= -Use Cases -========= - - -Real time Analytics -------------------- - -.. toctree:: - :maxdepth: 2 - - real-time-analytics-storing-log-data - real-time-analytics-pre-aggregated-reports - real-time-analytics-hierarchical-aggregation - -E-Commerce ----------- - -.. toctree:: - :maxdepth: 2 - - ecommerce-product-catalog - ecommerce-inventory-management - ecommerce-category-hierarchy - -Content Management Systems --------------------------- - -.. toctree:: - :maxdepth: 2 - - cms-metadata-and-asset-management - cms-storing-comments - -Online Gaming -------------- - -.. toctree:: - :maxdepth: 2 - - gaming-user-state - -Online Advertising ------------------- - -.. toctree:: - :maxdepth: 2 - - ad-serving-ads - ad-campaign-management - -Social Networking ------------------ - -.. toctree:: - :maxdepth: 2 - - social-user-profile diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt deleted file mode 100644 index 4ab5ba7cd5f..00000000000 --- a/source/applications/use-cases/use-case-template.txt +++ /dev/null @@ -1,60 +0,0 @@ -.. -*- rst -*- - -:orphan: - -==================== -TODO: Section: Title -==================== - -.. default-domain:: mongodb - -Overview --------- - -This document outlines the basic patterns and principles for using -MongoDB as a persistent storage engine for TODO: what are we building? - -Problem -~~~~~~~ - -TODO: describe problem - -Solution -~~~~~~~~ - -TODO: describe assumptions, overview of solution - -Schema Design -~~~~~~~~~~~~~ - -TODO: document collections, doc schemas - -Operations ----------- - -TODO: summary of the operations section - -The examples that follow use the Python programming language and the -:api:`PyMongo ` :term:`driver` for MongoDB, but you -can implement this system using any language you choose. - -Operation 1 -~~~~~~~~~~~ - -TODO: describe what the operation is (optional) - -Query -````` - -TODO: describe query - -Index Support -````````````` - -TODO: describe indexes to optimize this query - -Sharding --------- - -.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki - page.