From e21a4647733b5d5f140682f7d95bd695965b0483 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Sat, 24 Mar 2012 16:27:37 -0400 Subject: [PATCH 01/13] Begin work on gaming: user state Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 160 ++++++++++++++++++ source/applications/use-cases/index.txt | 8 + .../use-cases/use-case-template.txt | 58 +++++++ 3 files changed, 226 insertions(+) create mode 100644 source/applications/use-cases/gaming-user-state.txt create mode 100644 source/applications/use-cases/use-case-template.txt diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt new file mode 100644 index 00000000000..8d695ca6bcf --- /dev/null +++ b/source/applications/use-cases/gaming-user-state.txt @@ -0,0 +1,160 @@ +================================= +Online Gaming: Storing User State +================================= + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principals for using +MongoDB as a persistent storage engine for user state data from an online +game, particularly one that contains role-playing characteristics. + +Problem +~~~~~~~ + +In designing an online game, there is a need to store various +data about the user's character. Some of the attributes might include: + +Character attributes + These might include intrinsic characteristics such as strength, + dexterity, charisma, etc., as well as variable characteristics such + as health, mana (if your game includes magic), etc. +Character inventory + If your game includes the ability for the user to carry around + objects, you will need to keep track of the items carried. +Character location / relationship to the game world + If your game allows the user to move their character from one + location to another, this information needs to be stored as well. + +In addition, you need to store all this data for large numbers of +users who might be playing the game simultaneously, and this data +needs to be both readable and writeable with minimal latency in order +to ensure responsiveness during gameplay. + +Another consideration when designing the persistence backend for an +online game is its flexibility. Particularly in early releases of a +game, you may wish to change gameplay mechanics significantly as you +receive feedback from your users. As you implement these changes, you +need to be able to migrate your persistent data from one format to +another with minimal (or no) downtime. + +Solution +~~~~~~~~ + +The solution presented by this case study assumes that the read and +write performance for the user state is equally important and must be +accessible with minimal latency. + +Schema Design +~~~~~~~~~~~~~ + +Ultimately, the particulars of your schema depends on the particular +design of your game. When designing your schema for the user state, +you should attempt to encapsulate all the commonly used data into the +user object in order to minimize the number of queries to the database +and the number of seeks in a query. If you can manage to +encapsulate all relevant user state into a single document, this +satisfies both these criteria. + +In a role-playing game, then, a typical user state document might look +like the following: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'Tim', + character: { + intrinsics: { + strength: 10, + dexterity: 16, + intelligence: 17, + charisma: 8 }, + class: 'mage', + health: 212, + mana: 152 + }, + location: { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + contents: [ + { qty:1, id:ObjectId('...'), name:'grue' }, + { qty:1, id:ObjectId('...'), name:'Tim' }, + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] + }, + armor: [ + { id:ObjectId('...'), region:'head'}, + { id:ObjectId('...'), region:'body'}, + { id:ObjectId('...'), region:'hands'}, + { id:ObjectId('...'), region:'feet'}], + weapons: [ {id:ObjectId('...'), hand:'both'} ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'backpack', contents: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + { qty:1, id:ObjectId('...'), name:"wizard's hat", bonus:3}, + { qty:1, id:ObjectId('...'), name:"wizard's robe", bonus:0}, + { qty:1, id:ObjectId('...'), name:"old boots", bonus:0}, + { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2}, + { qty:523, id:ObjectId('...'), name:"gold" } ] + } + +There are a few things to note about this document. First, information +about the character's location in the game is encapsulated under the +``location`` attribute. Note in particular that all of the information +necessary to render the room is encapsulated within the user's state +document. This allows the game system to render the room without +making a second query to the database to get room information. + +Second, notice that the ``armor`` and ``weapons`` attributes contain +little information about the actual items being worn or carried. This +information is actually stored under the ``inventory`` property. Since +the inventory information is stored in the same document, there is no +need to replicate the detailed information about each item into the +``armor`` and ``weapons`` properties. + +Finally, note that ``inventory`` contains the item details necessary +for rendering each item in the character's posession, including any +enchantments (``bonus``) and ``quantity``. Once again, embedding this data +into the character record means you don't have to perform a separate +query to fetch item details necessary for display. + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +Eventually your system's events will exceed the capacity of a single +event logging database instance. In these situations you will want to +use a :term:`shard cluster`, which takes advantage of MongoDB's +:term:`sharding` functionality. This section introduces the unique +sharding concerns for this use case. + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 6144bf7e3dd..5aa6023e118 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -33,3 +33,11 @@ Content Management Systems cms-metadata-and-asset-management cms-storing-comments + +Online Gaming +------------- + +.. toctree:: + :maxdepth: 2 + + gaming-user-state diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt new file mode 100644 index 00000000000..9a8dcef8b44 --- /dev/null +++ b/source/applications/use-cases/use-case-template.txt @@ -0,0 +1,58 @@ +:orphan: + +==================== +TODO: Section: Title +==================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principals for using +MongoDB as a persistent storage engine for TODO: what are we building? + +Problem +~~~~~~~ + +TODO: describe problem + +Solution +~~~~~~~~ + +TODO: describe assumptions, overview of solution + +Schema Design +~~~~~~~~~~~~~ + +TODO: document collections, doc schemas + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. From cfcdb94be0be975b30d601068a513c12f63aa14c Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 15:23:34 -0400 Subject: [PATCH 02/13] First draft complete on gaming: user state Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 317 +++++++++++++++++- 1 file changed, 302 insertions(+), 15 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 8d695ca6bcf..6f71a5808e5 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -80,15 +80,17 @@ like the following: id: 'maze-1', description: 'a maze of twisty little passages...', exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } + ], contents: [ - { qty:1, id:ObjectId('...'), name:'grue' }, - { qty:1, id:ObjectId('...'), name:'Tim' }, { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] }, + gold: 523, armor: [ { id:ObjectId('...'), region:'head'}, { id:ObjectId('...'), region:'body'}, - { id:ObjectId('...'), region:'hands'}, { id:ObjectId('...'), region:'feet'}], weapons: [ {id:ObjectId('...'), hand:'both'} ], inventory: [ @@ -99,8 +101,7 @@ like the following: { qty:1, id:ObjectId('...'), name:"wizard's hat", bonus:3}, { qty:1, id:ObjectId('...'), name:"wizard's robe", bonus:0}, { qty:1, id:ObjectId('...'), name:"old boots", bonus:0}, - { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2}, - { qty:523, id:ObjectId('...'), name:"gold" } ] + { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2} ] } There are a few things to note about this document. First, information @@ -126,35 +127,321 @@ query to fetch item details necessary for display. Operations ---------- -TODO: summary of the operations section +In an online gaming system with the character state stored in a single document, +the primary operations you'll be performing are querying for the character state +document by ``_id``, extracting relevant data for display, and updating various +attributes about the character. This section describes procedures for performing +these queries, extractions, and updates. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you can implement this system using any language you choose. -Operation 1 -~~~~~~~~~~~ +Load Character Data from MongoDB +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TODO: describe what the operation is (optional) +The most basic operation in this system is loading the character state. Query ````` -TODO: describe query +Use the following query to load the user document from MongoDB: + +.. code-block:: pycon + + >>> character = db.characters.find_one({'_id': user_id}) Index Support ````````````` -TODO: describe indexes to optimize this query +In this case, the default index that MongoDB supplies on the ``_id`` field is +sufficient for good performance of this query. + +Extract Armor and Weapon Data for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to save space, the schema described above stores item details only in +the ``inventory`` attribute, storing ``ObjectId``\ s in other locations. To display +these item details, as on a character summary window, you need to merge the +information from the ``armor`` and ``weapons`` attributes with information from +the ``inventory`` attribute. + +Suppose, for instance, that your code is displaying the armor data using the +following Jinja2 template: + +.. code-block:: html + +
+

Armor

+
+ {% if value.head %} +
Helmet
+
{{value.head[0].description}}
+ {% endif %} + {% if value.hands %} +
Gloves
+
{{value.hands[0].description}}
+ {% endif %} + {% if value.feet %} +
Boots
+
{{value.feet[0].description}}
+ {% endif %} + {% if value.body %} +
Body Armor
+
    {% for piece in value.body %} +
  • piece.description
  • + {% endfor %}
+ {% endif %} +
+ + +In this case, you want the various ``description`` fields above to be text +similar to "+3 wizard's hat." The context passed to the template above, then, +would be of the following form: + +.. code-block:: python + + { + "head": [ { "id":..., "description": "+3 wizard's hat" } ], + "hands": [], + "feet": [ { "id":..., "description": "old boots" } ], + "body": [ { "id":..., "description": "wizard's robe" } ], + } + +In order to build up this structure, use the following helper functions: + +.. code-block:: python + + def get_item_index(inventory): + '''Given an inventory attribute, recursively build up an item + index (including all items contained within other items) + ''' + + result = {} + for item in inventory: + result[item['_id']] = item + if 'contents' in item: + result.update(get_item_index(item['contents'])) + return result + + def describe_item(item): + result = dict(item) + if item['bonus']: + description = '%+d %s' % (item['bonus'], item['name']) + else: + description = item['name'] + result['description'] = description + return result + + def get_armor_for_display(character, item_index): + '''Given a character document, return an 'armor' value + suitable for display''' + + result = dict(head=[], hands=[], feet=[], body=[]) + for piece in character['armor']: + item = describe_item(item_index[piece['id']]) + result[piece['region']].append(item) + return result + +In order to actually display the armor, then, you would use the following code: + +.. code-block:: pycon + + >>> item_index = get_item_index( + ... character['inventory'] + character['location']['contents']) + >>> armor = get_armor_for_dislay(character, item_index) + +Note in particular that you are building an index not only for the items the +character is actually carrying in inventory, but also for the items that the user +might interact with in the room. + +Similarly, in order to display the weapon information, you need to build a +structure such as the following: + +.. code-block:: python + + { + "left": None, + "right": None, + "both": { "description": "+2 quarterstaff" } + } + +The helper function is similar to that for ``get_armor_for_display``: + +.. code-block:: python + + def get_weapons_for_display(character, item_index): + '''Given a character document, return a 'weapons' value + suitable for display''' + + result = dict(left=None, right=None, both=None) + for piece in character['weapons']: + item = describe_item(item_index[piece['id']]) + result[piece['hand']] = item + return result + +In order to actually display the weapons, then, you would use the following code: + +.. code-block:: pycon + + >>> armor = get_weapons_for_dislay(character, item_index) + +Extract Character Attributes, Inventory, and Room Information for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to display information about the character's attributes, inventory, and +surroundings, you also need to extract fields from the character state. In this +case, however, the schema defined above keeps all the relevant information for +display embedded in those sections of the document. The code for extracting this +data, then, is the following: + +.. code-block:: pycon + + >>> attributes = character['character'] + >>> inventory = character['inventory'] + >>> room_data = character['location'] + +Update Character Inventory +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the user decides to pick up an item from the room and add +it to their inventory. In this case, you need to update both the character state +and the global location state: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location.contents': { '_id': item['id'] } } }) + db.location.update( + { '_id': character['location']['id'] }, + { '$pull': { 'contents': { 'id': item_id } } }) + +While the above code may be for a single-player game, if you allow multiple +players, or non-player characters, to pick up items, that introduces a problem in +the above code where two characters may try to pick up an item simultaneously. To +guard against that, use the ``location`` collection to decide between ties. In +this case, the code is now the following: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + result = db.location.update( + { '_id': character['location']['id'], + 'contents.id': item_id }, + { '$pull': { 'contents': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location': { '_id': item['id'] } } }) + +By ensuring that the item is present before removing it from the room in the +``update`` call above, you guarantee that only one player/non-player +character/monster can pick up the item. + +Move the Character to a Different Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the user decides to move north. In this case, you need to +update the character state to match the new location: + +.. code-block:: python + + def move(character, direction): + '''Move the character to a new location''' + + # Remove character from current location + db.location.update( + {'_id': character['location']['id'] }, + {'$pull': {'players': {'id': character['_id'] } } }) + # Add character to new location, retrieve new location data + new_location = db.location.find_and_modify( + { '_id': character['location']['exits'][direction] }, + { '$push': { 'players': { + 'id': character['_id'], + 'name': character['name'] } } }, + new=True) + character['location'] = new_location + db.character.update( + { '_id': character['_id'] }, + { '$set': { 'location': new_location } }) + +Here, note that the code updates the old room, the new room, and the character +document. + +Buy an Item +~~~~~~~~~~~ + +If your character wants to buy an item, you need to add that item to the +character's inventory, decrement the character's gold, increment the shopkeeper's +gold, and update the room: + +.. code-block:: python + + def buy(character, shopkeeper, item_id, price): + '''Pick up an item, add to the character's inventory, and transfer + payment to the shopkeeper + ''' + + result = db.character.update( + { '_id': character['_id'], + 'gold': { '$gte': price } }, + { '$inc': { 'gold': -price } }, + safe=True ) + if not result['updatedExisting']: + raise InsufficientFunds() + try: + pick_up_item(character, item_id) + except: + # Add the gold back to the character + result = db.character.update( + { '_id': character['_id'] }, + { '$inc': { 'gold': price } } ) + raise + character['gold'] -= price + db.character.update( + { '_id': shopkeeper['_id'] }, + { '$inc': { 'gold': price } } ) + +Note that the code above ensures that the character has sufficent gold to pay for +the item using the ``updatedExisting`` trick used for picking up items. The race +condition for item pickup is handled as well, "rolling back" the removal of gold +from the character's wallet if the item cannot be picked up. Sharding -------- -Eventually your system's events will exceed the capacity of a single -event logging database instance. In these situations you will want to +If your system needs to scale beyond a single MongoDB node, you will want to use a :term:`shard cluster`, which takes advantage of MongoDB's -:term:`sharding` functionality. This section introduces the unique -sharding concerns for this use case. +:term:`sharding` functionality. .. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki page. + +Sharding in this use case is fairly +straightforward, since all our items are always retrieved by ``_id``. To shard +the ``character`` and ``location`` collections, the commands would be the +following: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'character') + { "collectionsharded" : "character", "ok" : 1 } + >>> db.command('shardcollection', 'location')) + { "collectionsharded" : "location", "ok" : 1 } + +Note that there is no need here to specify a :term:`shard key` since MongoDB +shards on ``_id`` by default. From fd419ebbdcdca0096cede2b81f92d02a2cb81294 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 15:35:59 -0400 Subject: [PATCH 03/13] Add ordered list for itemized explanation Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 36 +++++++++---------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 6f71a5808e5..ab44f3a01fa 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -104,25 +104,23 @@ like the following: { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2} ] } -There are a few things to note about this document. First, information -about the character's location in the game is encapsulated under the -``location`` attribute. Note in particular that all of the information -necessary to render the room is encapsulated within the user's state -document. This allows the game system to render the room without -making a second query to the database to get room information. - -Second, notice that the ``armor`` and ``weapons`` attributes contain -little information about the actual items being worn or carried. This -information is actually stored under the ``inventory`` property. Since -the inventory information is stored in the same document, there is no -need to replicate the detailed information about each item into the -``armor`` and ``weapons`` properties. - -Finally, note that ``inventory`` contains the item details necessary -for rendering each item in the character's posession, including any -enchantments (``bonus``) and ``quantity``. Once again, embedding this data -into the character record means you don't have to perform a separate -query to fetch item details necessary for display. +There are a few things to note about this document: + +#. Information about the character's location in the game is encapsulated under + the ``location`` attribute. Note in particular that all of the information + necessary to render the room is encapsulated within the user's state + document. This allows the game system to render the room without making a + second #query to the database to get room information. +#. The ``armor`` and ``weapons`` attributes contain little information about the + actual items being worn or carried. This information is actually stored under + the ``inventory`` property. Since the inventory information is stored in the + same document, there is no need to replicate the detailed information about + each item into the ``armor`` and ``weapons`` properties. +#. Finally, note that ``inventory`` contains the item details necessary for + rendering each item in the character's posession, including anyenchantments + (``bonus``) and ``quantity``. Once again, embedding this data into the + character record means you don't have to perform a separate query to fetch + item details necessary for display. Operations ---------- From cfc3dde2afa8b50c15b9f91970c12f2bfcdc1668 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 16:42:02 -0400 Subject: [PATCH 04/13] Fix sharding commands, fix output of shardcollection to be inline with the interactive python session Signed-off-by: Rick Copeland --- .../cms-metadata-and-asset-management.txt | 3 ++- .../use-cases/cms-storing-comments.txt | 5 +--- .../ecommerce-category-hierarchy.txt | 5 ++-- .../ecommerce-inventory-management.txt | 13 ++++------ .../use-cases/ecommerce-product-catalog.txt | 5 ---- .../use-cases/gaming-user-state.txt | 8 +++---- ...ime-analytics-hierarchical-aggregation.txt | 24 +++++++++---------- ...-time-analytics-pre-aggregated-reports.txt | 14 ++--------- 8 files changed, 27 insertions(+), 50 deletions(-) diff --git a/source/applications/use-cases/cms-metadata-and-asset-management.txt b/source/applications/use-cases/cms-metadata-and-asset-management.txt index e3a4886c3e0..c2693f8e7be 100644 --- a/source/applications/use-cases/cms-metadata-and-asset-management.txt +++ b/source/applications/use-cases/cms-metadata-and-asset-management.txt @@ -444,7 +444,8 @@ on the ``_id`` field (none of the node metadata is available on the .. code-block:: python - >>> db.command('shardcollection', 'cms.assets.chunks' + >>> db.command('shardcollection', 'cms.assets.chunks', { + ... key: { '_id': 1 } }) { "collectionsharded" : "cms.assets.chunks", "ok" : 1 } This actually still maintains the query-routability constraint, since diff --git a/source/applications/use-cases/cms-storing-comments.txt b/source/applications/use-cases/cms-storing-comments.txt index c4c73821e07..89befea6576 100644 --- a/source/applications/use-cases/cms-storing-comments.txt +++ b/source/applications/use-cases/cms-storing-comments.txt @@ -711,9 +711,6 @@ at the Python/PyMongo console: >>> db.command('shardcollection', 'comment_pages', { ... key : { 'discussion_id' : 1, 'page': 1 } }) + { "collectionsharded" : "comment_pages", "ok" : 1 } -This will return the following response: -.. code-block:: javascript - - { "collectionsharded" : "comment_pages", "ok" : 1 } diff --git a/source/applications/use-cases/ecommerce-category-hierarchy.txt b/source/applications/use-cases/ecommerce-category-hierarchy.txt index 9b03bae98f8..f7996c3fd70 100644 --- a/source/applications/use-cases/ecommerce-category-hierarchy.txt +++ b/source/applications/use-cases/ecommerce-category-hierarchy.txt @@ -242,8 +242,7 @@ the category collection would then be the following: .. code-block:: python - >>> db.command('shardcollection', 'categories') + >>> db.command('shardcollection', 'categories', { + ... key: {'_id': 1} }) { "collectionsharded" : "categories", "ok" : 1 } -Note that there is no need to specify the shard key, as MongoDB will -default to using ``_id`` as a shard key. diff --git a/source/applications/use-cases/ecommerce-inventory-management.txt b/source/applications/use-cases/ecommerce-inventory-management.txt index e718a9606ac..9d525c1a541 100644 --- a/source/applications/use-cases/ecommerce-inventory-management.txt +++ b/source/applications/use-cases/ecommerce-inventory-management.txt @@ -398,15 +398,12 @@ minimize server load. The sharding commands you'd use to shard the cart and inventory collections, then, would be the following: -.. code-block:: python - - db.command('shardcollection', 'inventory') - db.command('shardcollection', 'cart') - -.. code-block:: javascript +.. code-block:: pycon + >>> db.command('shardcollection', 'inventory', { + ... 'key': {'_id': 1} }) { "collectionsharded" : "inventory", "ok" : 1 } + >>> db.command('shardcollection', 'cart', { + ... 'key': {'_id': 1} }) { "collectionsharded" : "cart", "ok" : 1 } -Note that there is no need to specify the shard key, as MongoDB will -default to using ``_id`` as a shard key. diff --git a/source/applications/use-cases/ecommerce-product-catalog.txt b/source/applications/use-cases/ecommerce-product-catalog.txt index 578dab5bb0d..006d1e18fe2 100644 --- a/source/applications/use-cases/ecommerce-product-catalog.txt +++ b/source/applications/use-cases/ecommerce-product-catalog.txt @@ -494,11 +494,6 @@ Python/PyMongo console: >>> db.command('shardcollection', 'product', { ... key : { 'type': 1, 'details.genre' : 1, 'sku':1 } }) - -Upon success, you will see the following response: - -.. code-block:: javascript - { "collectionsharded" : "details.genre", "ok" : 1 } .. note:: diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index ab44f3a01fa..a45c6517d6d 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -436,10 +436,10 @@ following: .. code-block:: pycon - >>> db.command('shardcollection', 'character') + >>> db.command('shardcollection', 'character', { + ... 'key': { '_id': 1 } }) { "collectionsharded" : "character", "ok" : 1 } - >>> db.command('shardcollection', 'location')) + >>> db.command('shardcollection', 'location', { + ... 'key': { '_id': 1 } }) { "collectionsharded" : "location", "ok" : 1 } -Note that there is no need here to specify a :term:`shard key` since MongoDB -shards on ``_id`` by default. diff --git a/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt b/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt index d6aacb77cfd..3457a1f04b3 100644 --- a/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt +++ b/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt @@ -470,24 +470,22 @@ timestamp) on the events collection. Consider the following: .. code-block:: pycon >>> db.command('shardcollection','events', { - ... key : { 'userid': 1, 'ts' : 1} } ) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key' : { 'userid': 1, 'ts' : 1} } ) { "collectionsharded": "events", "ok" : 1 } -To shard the aggregated collections you must use the ``_id`` field, -which is the default, so you can issue the following group of shard -operations in the Python/PyMongo shell: +To shard the aggregated collections you must use the ``_id`` field, so you can +issue the following group of shard operations in the Python/PyMongo shell: .. code-block:: python - db.command('shardcollection', 'stats.daily') - db.command('shardcollection', 'stats.weekly') - db.command('shardcollection', 'stats.monthly') - db.command('shardcollection', 'stats.yearly') + db.command('shardcollection', 'stats.daily', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.weekly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.monthly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.yearly', { + 'key': { '_id': 1 } }) You should also update the ``h_aggregate`` map-reduce wrapper to support sharded output Add ``'sharded':True`` to the ``out`` diff --git a/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt b/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt index da0a3e0e6eb..00b6d01c82b 100644 --- a/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt +++ b/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt @@ -621,12 +621,7 @@ collection in the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.daily', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.daily", "ok" : 1 } Enable sharding for the monthly statistics collection with the @@ -636,12 +631,7 @@ console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.monthly', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.monthly", "ok" : 1 } .. note:: From 58c4d09ec900f282113f5e83ce325b1ce177a061 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 29 Mar 2012 17:20:38 -0400 Subject: [PATCH 05/13] Add more operations, include location and item collections Signed-off-by: Rick Copeland --- .../use-cases/gaming-user-state.txt | 210 ++++++++++++++---- 1 file changed, 163 insertions(+), 47 deletions(-) diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index a45c6517d6d..8affe9125ea 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -1,6 +1,6 @@ -================================= -Online Gaming: Storing User State -================================= +=========================================== +Online Gaming: Creating a Role-Playing Game +=========================================== .. default-domain:: mongodb @@ -8,35 +8,44 @@ Overview -------- This document outlines the basic patterns and principals for using -MongoDB as a persistent storage engine for user state data from an online +MongoDB as a persistent storage engine for an online game, particularly one that contains role-playing characteristics. Problem ~~~~~~~ In designing an online game, there is a need to store various -data about the user's character. Some of the attributes might include: +data about the player's character. Some of the attributes might include: Character attributes These might include intrinsic characteristics such as strength, dexterity, charisma, etc., as well as variable characteristics such as health, mana (if your game includes magic), etc. Character inventory - If your game includes the ability for the user to carry around + If your game includes the ability for the player to carry around objects, you will need to keep track of the items carried. Character location / relationship to the game world - If your game allows the user to move their character from one + If your game allows the player to move their character from one location to another, this information needs to be stored as well. In addition, you need to store all this data for large numbers of -users who might be playing the game simultaneously, and this data +playerss who might be playing the game simultaneously, and this data needs to be both readable and writeable with minimal latency in order to ensure responsiveness during gameplay. +In addition to the above data, you also need to store data for + +Items + These include various artifacts that the character might interact with such as + weapons, armor, treasure, etc. +Locations + The various locations in which characters and items might find themselves such + as rooms, halls, etc. + Another consideration when designing the persistence backend for an online game is its flexibility. Particularly in early releases of a game, you may wish to change gameplay mechanics significantly as you -receive feedback from your users. As you implement these changes, you +receive feedback from your players. As you implement these changes, you need to be able to migrate your persistent data from one format to another with minimal (or no) downtime. @@ -44,21 +53,24 @@ Solution ~~~~~~~~ The solution presented by this case study assumes that the read and -write performance for the user state is equally important and must be -accessible with minimal latency. +write performance is equally important and must be accessible with minimal +latency. Schema Design ~~~~~~~~~~~~~ Ultimately, the particulars of your schema depends on the particular -design of your game. When designing your schema for the user state, -you should attempt to encapsulate all the commonly used data into the -user object in order to minimize the number of queries to the database -and the number of seeks in a query. If you can manage to -encapsulate all relevant user state into a single document, this +design of your game. When designing your schema, you should attempt to +encapsulate all the commonly used data into a small number of objects in order to +minimize the number of queries to the database and the number of seeks in a +query. Encapsulating all player state into a ``character`` collection, item data +into an ``item`` collection, and location data into a ``location`` collection satisfies both these criteria. -In a role-playing game, then, a typical user state document might look +Character Schema +```````````````` + +In a role-playing game, then, a typical character state document might look like the following: .. code-block:: javascript @@ -84,7 +96,7 @@ like the following: { id:ObjectId('...'), name:'grue' }, { id:ObjectId('...'), name:'Tim' } ], - contents: [ + inventory: [ { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] }, gold: 523, @@ -94,7 +106,7 @@ like the following: { id:ObjectId('...'), region:'feet'}], weapons: [ {id:ObjectId('...'), hand:'both'} ], inventory: [ - { qty:1, id:ObjectId('...'), name:'backpack', contents: [ + { qty:1, id:ObjectId('...'), name:'backpack', inventory: [ { qty:4, id:ObjectId('...'), name: 'potion of healing'}, { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, @@ -108,7 +120,7 @@ There are a few things to note about this document: #. Information about the character's location in the game is encapsulated under the ``location`` attribute. Note in particular that all of the information - necessary to render the room is encapsulated within the user's state + necessary to render the room is encapsulated within the character state document. This allows the game system to render the room without making a second #query to the database to get room information. #. The ``armor`` and ``weapons`` attributes contain little information about the @@ -116,20 +128,73 @@ There are a few things to note about this document: the ``inventory`` property. Since the inventory information is stored in the same document, there is no need to replicate the detailed information about each item into the ``armor`` and ``weapons`` properties. -#. Finally, note that ``inventory`` contains the item details necessary for +#. ``inventory`` contains the item details necessary for rendering each item in the character's posession, including anyenchantments (``bonus``) and ``quantity``. Once again, embedding this data into the character record means you don't have to perform a separate query to fetch item details necessary for display. +Item Schema +``````````` + +Likewise, the item schema should include all details about all items globally in +the game: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'backpack', + bonus: null, + inventory: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + weight: 12, + price: 160, + ... + } + +Note that this document contains more or less the same information as stored in +the ``inventory`` attribute of ``character`` documents, as well as additional +data which may only be needed sporadically in the case of game-play such as +``weight`` and ``price``. + +Location Schema +``````````````` + +Finally, the ``location`` schema specifies the state of the world in the game: + +.. code-block:: javascript + + { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' } ], + } + +Here, note that ``location`` stores exactly the same information as is stored in +the ``location`` attribute of the ``character`` document. You will use +``location`` as the system of record when the game requires interaction between +multiple characters or between characters and non-inventory items. + Operations ---------- -In an online gaming system with the character state stored in a single document, -the primary operations you'll be performing are querying for the character state -document by ``_id``, extracting relevant data for display, and updating various -attributes about the character. This section describes procedures for performing -these queries, extractions, and updates. +In an online gaming system with the state embedded in a single document for +``character``, ``item``, and ``location``, the primary operations you'll be +performing are querying for the character state by ``_id``, extracting relevant +data for display, and updating various attributes about the character. This +section describes procedures for performing these queries, extractions, and +updates. + +In particular you should try *not* to load the ``location`` or ``item`` documents +except when absolutely necessary. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you @@ -143,11 +208,11 @@ The most basic operation in this system is loading the character state. Query ````` -Use the following query to load the user document from MongoDB: +Use the following query to load the ``character`` document from MongoDB: .. code-block:: pycon - >>> character = db.characters.find_one({'_id': user_id}) + >>> character = db.characters.find_one({'_id': character_id}) Index Support ````````````` @@ -158,11 +223,11 @@ sufficient for good performance of this query. Extract Armor and Weapon Data for Display ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In order to save space, the schema described above stores item details only in -the ``inventory`` attribute, storing ``ObjectId``\ s in other locations. To display -these item details, as on a character summary window, you need to merge the -information from the ``armor`` and ``weapons`` attributes with information from -the ``inventory`` attribute. +In order to save space, the ``character`` schema described above stores item +details only in the ``inventory`` attribute, storing ``ObjectId``\ s in other +locations. To display these item details, as on a character summary window, you +need to merge the information from the ``armor`` and ``weapons`` attributes with +information from the ``inventory`` attribute. Suppose, for instance, that your code is displaying the armor data using the following Jinja2 template: @@ -218,11 +283,13 @@ In order to build up this structure, use the following helper functions: result = {} for item in inventory: result[item['_id']] = item - if 'contents' in item: - result.update(get_item_index(item['contents'])) + if 'inventory' in item: + result.update(get_item_index(item['inventory])) return result def describe_item(item): + '''Add a 'description' field to the given item''' + result = dict(item) if item['bonus']: description = '%+d %s' % (item['bonus'], item['name']) @@ -246,12 +313,12 @@ In order to actually display the armor, then, you would use the following code: .. code-block:: pycon >>> item_index = get_item_index( - ... character['inventory'] + character['location']['contents']) + ... character['inventory'] + character['location']['inventory']) >>> armor = get_armor_for_dislay(character, item_index) Note in particular that you are building an index not only for the items the -character is actually carrying in inventory, but also for the items that the user -might interact with in the room. +character is actually carrying in inventory, but also for the items that the +player might interact with in the room. Similarly, in order to display the weapon information, you need to build a structure such as the following: @@ -299,10 +366,10 @@ data, then, is the following: >>> inventory = character['inventory'] >>> room_data = character['location'] -Update Character Inventory -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Pick Up an Item From a Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In your game, suppose the user decides to pick up an item from the room and add +In your game, suppose the player decides to pick up an item from the room and add it to their inventory. In this case, you need to update both the character state and the global location state: @@ -316,10 +383,10 @@ and the global location state: db.character.update( { '_id': character['_id'] }, { '$push': { 'inventory': item }, - '$pull': { 'location.contents': { '_id': item['id'] } } }) + '$pull': { 'location.inventory': { '_id': item['id'] } } }) db.location.update( { '_id': character['location']['id'] }, - { '$pull': { 'contents': { 'id': item_id } } }) + { '$pull': { 'inventory': { 'id': item_id } } }) While the above code may be for a single-player game, if you allow multiple players, or non-player characters, to pick up items, that introduces a problem in @@ -336,8 +403,8 @@ this case, the code is now the following: character['inventory'].append(item) result = db.location.update( { '_id': character['location']['id'], - 'contents.id': item_id }, - { '$pull': { 'contents': { 'id': item_id } } }, + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, safe=True) if not result['updatedExisting']: raise Conflict() @@ -350,10 +417,58 @@ By ensuring that the item is present before removing it from the room in the ``update`` call above, you guarantee that only one player/non-player character/monster can pick up the item. +Remove an Item from a Container +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the game described here, the ``backpack`` item can contain other +items. You might further suppose that some other items may be similarly +hierarchical (e.g. a chest in a room). Suppose that the player wishes to move an +item from one of these "containers" into their active ``inventory`` as a prelude +to using it. In this case, you need to update both the character state and the +item state: + +.. code-block:: python + + def move_to_active_inventory(character, item_index, container_id, item_id): + '''Transfer an item from the given container to the character's active + inventory + ''' + + result = db.item.update( + { '_id': container_id, + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + item = item_index[item_id] + container = item_index[item_id] + character['inventory'].append(item) + container['inventory'] = [ + item for item in container['inventory'] + if item['_id'] != item_id ] + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item } } ) + db.character.update( + { '_id': character['_id'], 'inventory.id': container_id }, + { '$pull': { 'inventory.$.inventory': { 'id': item_id } } } ) + +Note in the code above that you: + +- Ensure that the item's state makes this update reasonable (the item is + actually contained within the container). Abort with an error if this is not + true. +- Update the in-memory ``character`` document's inventory, adding the item. +- Update the in-memory ``container`` document's inventory, removing the item. +- Update the ``character`` document in MongoDB. +- In the case that the character is moving an item from a container *in his own + inventory*, update the character's inventory representation of the container. + Move the Character to a Different Room ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In your game, suppose the user decides to move north. In this case, you need to +In your game, suppose the player decides to move north. In this case, you need to update the character state to match the new location: .. code-block:: python @@ -389,11 +504,12 @@ gold, and update the room: .. code-block:: python - def buy(character, shopkeeper, item_id, price): + def buy(character, shopkeeper, item_id): '''Pick up an item, add to the character's inventory, and transfer payment to the shopkeeper ''' + price = db.item.find_one({'_id': item_id}, {'price':1})['price'] result = db.character.update( { '_id': character['_id'], 'gold': { '$gte': price } }, From 53c64e882d5e2ea73b7bf3be919de4e1b877c4e3 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 4 Apr 2012 17:02:28 -0400 Subject: [PATCH 06/13] Fix spelling error Signed-off-by: Rick Copeland --- source/applications/use-cases/ecommerce-product-catalog.txt | 2 +- source/applications/use-cases/gaming-user-state.txt | 2 +- .../use-cases/real-time-analytics-storing-log-data.txt | 2 +- source/applications/use-cases/use-case-template.txt | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/source/applications/use-cases/ecommerce-product-catalog.txt b/source/applications/use-cases/ecommerce-product-catalog.txt index 006d1e18fe2..c7572d87c8b 100644 --- a/source/applications/use-cases/ecommerce-product-catalog.txt +++ b/source/applications/use-cases/ecommerce-product-catalog.txt @@ -7,7 +7,7 @@ E-Commerce: Product Catalog Overview -------- -This document describes the basic patterns and principals for +This document describes the basic patterns and principles for designing an E-Commerce product catalog system using MongoDB as a storage engine. diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt index 8affe9125ea..a6df1abc4a5 100644 --- a/source/applications/use-cases/gaming-user-state.txt +++ b/source/applications/use-cases/gaming-user-state.txt @@ -7,7 +7,7 @@ Online Gaming: Creating a Role-Playing Game Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for an online game, particularly one that contains role-playing characteristics. diff --git a/source/applications/use-cases/real-time-analytics-storing-log-data.txt b/source/applications/use-cases/real-time-analytics-storing-log-data.txt index a11e998e6d6..61da2a0716f 100644 --- a/source/applications/use-cases/real-time-analytics-storing-log-data.txt +++ b/source/applications/use-cases/real-time-analytics-storing-log-data.txt @@ -7,7 +7,7 @@ Real Time Analytics: Storing Log Data Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data. diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt index 9a8dcef8b44..da5a99f2269 100644 --- a/source/applications/use-cases/use-case-template.txt +++ b/source/applications/use-cases/use-case-template.txt @@ -9,7 +9,7 @@ TODO: Section: Title Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for TODO: what are we building? Problem From 094391628abe050ebe0e9f8ad14fe8e3357e1c75 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 4 Apr 2012 18:10:03 -0400 Subject: [PATCH 07/13] Begin advertising use case Signed-off-by: Rick Copeland --- .../use-cases/ad-campaign-management.txt | 149 ++++++++++++++++++ source/applications/use-cases/index.txt | 8 + .../use-cases/use-case-template.txt | 2 + 3 files changed, 159 insertions(+) create mode 100644 source/applications/use-cases/ad-campaign-management.txt diff --git a/source/applications/use-cases/ad-campaign-management.txt b/source/applications/use-cases/ad-campaign-management.txt new file mode 100644 index 00000000000..e99d36997cb --- /dev/null +++ b/source/applications/use-cases/ad-campaign-management.txt @@ -0,0 +1,149 @@ +.. -*- rst -*- + +======================================= +Online Advertising: Campaign Management +======================================= + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on creating and maintaining an advertising +campaign with a pre-set daily budget and cost per click (CPC) and cost per +thousand impressions (CPM) limit. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media. As part of this ad serving, you want to track which ads are +available to be served, based on both the daily budget and the CPC and CPM +limits. + +As part of a campaign, a customer creates one or more *zones*, where +each zone represents some location on a group of pages. Each zone in a campaign +then has one or more ads assigned to it + +Solution +~~~~~~~~ + +In this solution, you will store each campaign's metadata in its own document, +including budget, limits, targets, and ongoing statistics. This data can be +modified before the campaign starts or during the campaign itself. + +Schema Design +~~~~~~~~~~~~~ + +The schema for campaign management consists of a two collections, one which +stores campaign metadata, and another for campaign statistics. The campaign +metadata collection ``campaign.metadata`` documents have the following +format: + +.. code-block:: javascript + + { + _id: ObjectId(...), + customer_id: ObjectId(...), + title: "August Shoes Campaign", + begin: ISODate("2012-08-01T00:00:00Z"), + end: ISODate("2012-08-31T00:00:00Z"), + zones: { + z1: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'banner', + limit: { type: 'cpm', value: 2000 }, + ad_ids: [ 'ad1', 'ad2' ] }, + z2: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'tower-1a', + limit: { type: 'cpc', value: 45 }, + ad_ids: [ 'ad3' ] } }, + daily_budget: 25000 + } + +The statistics are stored in their own collection ``campaign.stats``: + +.. code-block:: javascript + + { + _id: ObjectId(...), // same as campaign ID + zones: { + z1: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10146, clicks: 198, conversions: 16 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9182, clicks: 183, conversions: 18 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9784, clicks: 202, conversions: 21 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }, + z2: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10457, clicks: 79, conversions: 14 }, + ad3: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9283, clicks: 53, conversions: 8 }, + ad3: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9197, clicks: 72, conversions: 14 }, + ad3: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }], + daily_spent: { + '2012-08-01': 23847, + '2012-08-02': 20749, + ... + '2012-08-12': 0, + '2012-08-13': 0, + ... + '2012-08-31': 0 } + } + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 5aa6023e118..86b7c52d8db 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -41,3 +41,11 @@ Online Gaming :maxdepth: 2 gaming-user-state + +Online Advertising +------------------ + +.. toctree:: + :maxdepth: 2 + + ad-campaign-management diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt index da5a99f2269..4ab5ba7cd5f 100644 --- a/source/applications/use-cases/use-case-template.txt +++ b/source/applications/use-cases/use-case-template.txt @@ -1,3 +1,5 @@ +.. -*- rst -*- + :orphan: ==================== From d8184d6cc909147eac3ce07db0006c3ea769af0a Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 6 Apr 2012 14:00:54 -0400 Subject: [PATCH 08/13] Creating ad serving use case Signed-off-by: Rick Copeland --- .../applications/use-cases/ad-serving-ads.txt | 297 ++++++++++++++++++ .../use-cases/cms-storing-comments.txt | 2 +- source/applications/use-cases/index.txt | 1 + 3 files changed, 299 insertions(+), 1 deletion(-) create mode 100644 source/applications/use-cases/ad-serving-ads.txt diff --git a/source/applications/use-cases/ad-serving-ads.txt b/source/applications/use-cases/ad-serving-ads.txt new file mode 100644 index 00000000000..6d49a718c9d --- /dev/null +++ b/source/applications/use-cases/ad-serving-ads.txt @@ -0,0 +1,297 @@ +.. -*- rst -*- + +============================== +Online Advertising: Ad Serving +============================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on the task of deciding *which* ad to serve +when a user visits a particular site. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media sites. As part of this ad serving, you want to track which ads are +available to be served, and decide on a particular ad to be served in a +particular zone. + +Solution +~~~~~~~~ + +This solution is structured as a progressive refinement of the ad network, +starting out with the basic data storage requirements and adding more advanced +features to the schema to support more advanced ad targeting. The key performance +criterion for this solution is the latency between receiving an ad request and +returning the (targeted) ad to be displayed. + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Design 1: Basic Ad Serving +-------------------------- + +A basic ad serving algorithm consists of the following steps: + +#. The network receives a request for an ad, specifying at a minimum the + ``site_id`` and ``zone_id`` to be served. +#. The network consults its inventory of ads available to display and chooses an + ad based on various business rules. +#. The network returns the actual ad to be displayed, possibly recording the + decision made as well. + +This design uses the ``site_id`` and ``zone_id`` submitted with the ad request, +as well as information stored in the ad inventory collection, to make the ad +targeting decisions. Later examples will build on this, allowing more advanced ad +targeting. + +Schema Design +~~~~~~~~~~~~~ + +A very basic schema for storing ads available to be served consists of a single +collection, ``ad.zone``: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'banner', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23a', + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23b', + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'banner12', + ecpm: 200 }, + ... ] + } + +Note that for each (``site``, ``zone``) combination you'll be storing a list of +ads, sorted by their ``ecpm`` values. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve selects a compatible ad and +sorts by the advertiser's ``ecpm`` bid in order to maximize the ad network's +profits: + +.. code-block:: python + + from itertools import groupby + from random import choice + + def choose_ad(site_id, zone_id): + site = db.ad.inventory.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return None + if len(site['ads']) == 0: return None + ecpm_groups = groupby(site['ads'], key=lambda ad:ad['ecpm']) + ecpm, ad_group = ecpm_groups.next() + return choice(list(ad_group)) + +Index Support +````````````` + +In order to execute the ad choice with the lowest latency possible, you'll want +to have a compound index on (``site_id``, ``zone_id``): + +.. code-block:: pycon + + >>> db.ad.inventory.ensure_index([ + ... ('site_id', 1), + ... ('zone_id', 1) ]) + +Making an Ad Campaign Inactive +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One case you'll have to deal with in this solution making a campaign +inactive. This may happen for a variety of reasons. For instance, the campaign +may have reached its end date or exhausted its budget for the current time +period. In this case, the logic is fairly straightforward: + +.. code-block:: python + + def deactivate_campaign(campaign_id): + db.ad.inventory.update( + { 'ads.campaign_id': campaign_id }, + {' $pull': { 'ads', { 'campaign_id': campaign_id } } }, + multi=True) + +The update statement above first selects only those ad zones which had avaialable +ads from the given ``campaign_id`` and then uses the ``$pull`` modifier to remove +them from rotation. + +Index Support +````````````` + +In order to execute the multi-update quickly, you should maintain an index on the +``ads.campaign_id`` field: + +.. code-block:: pycon + + >>> db.ad.inventory.ensure_index('ads.campaign_id') + +Sharding +~~~~~~~~ + +In order to scale beyond the capacity of a single replica set, you will need to +shard the ``ad.inventory`` collection. To maintain the lowest possible latency in +the ad selection operation, the :term:`shard key` needs to be chosen to allow +MongoDB to route the ``ad.inventory`` query to a single shard. In this case, a +good approach is to shard on the (``site_id``, ``zone_id``) combination: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.inventory', { + ... 'key': {'site_id': 1, 'zone_id': 1} }) + { "collectionsharded": "ad.inventory", "ok": 1 } + +Design 2: Adding Frequency Capping +---------------------------------- + +One problem with the logic described in Design 1 above is that it will tend to +display the same ad over and over again until the campaign's budget is +exhausted. To mitigate this, advertisers may wish to limit the frequency with +which a given user is presented a particular ad. This process is called frequency +capping and is an example of user profile targeting in advertising. + +In order to perform frequency capping (or any type of user targeting), the ad +network needs to maintain a profile for each visitor, typically implemented as a +cookie in the user's browser. This cookie, effectively a ``user_id``, is then +transmitted to the ad network when logging impressions, clicks, conversions, +etc., as well as the ad serving decision. This section focuses on how that +profile data impacts the ad serving decision. + +Schema Design +~~~~~~~~~~~~~ + +In order to use the user profile data, you need to store it. In this case, it's +stored in a collection ``ad.user``: + +.. code-block:: javascript + + { + _id: 'cookie_value', + advertisers: { + mercedes: { + impressions: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + clicks: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + bmw: [ ... ], + ... + } + } + +There are a few things to note about the user profile: + +- Profile information is segmented by advertiser. Typically advertising data is + sensitive competitive infomration that can't be shared among advertisers, so + this must be kept separate. +- All data is embedded in a single profile document. When you need to query this + data (detailed below), you don't necessarily know which advertiser's ads you'll + be showing, so it's a good practice to embed all advertisers in a single + document. +- The event information is grouped by event type within an advertiser, and sorted + by timestamp. This allows rapid lookups of a stream of a particular type of + event. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve now needs to iterate through +ads in order of desireability and select the "best" ad that also satisfies the +advertiser's targeting rules (in this case, the frequency cap): + +.. code-block:: python + + from itertools import groupby + from random import shuffle + from datetime import datetime, timedelta + + def choose_ad(site_id, zone_id, user_id): + site = db.ad.inventory.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None or len(site['ads']) == 0: return None + ads = ad_iterator(site['ads']) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: + # any ad is acceptable for an unknown user + return ads.next() + for ad in ads: + advertiser_id = ad['campaign_id'].split(':', 1)[0] + if ad_is_acceptable(ad, user[advertiser_id]): + return ad + return None + + def ad_iterator(ads): + '''Find available ads, sorted by ecpm, with random sort for ties''' + ecpm_groups = groupby(ads, key=lambda ad:ad['ecpm']) + for ecpm, ad_group in ecpm_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_is_acceptable(ad, profile): + '''Returns False if the user has seen the ad today''' + threshold = datetime.utcnow() - timedelta(days=1) + for event in reversed(profile['impressions']): + if event['timestamp'] < threshold: break + if event['detail']['ad_unit_id'] == ad['ad_unit_id']: + return False + return True + +Here, the ``chose_ad()`` function provides the framework for your ad selection +process. The ``site`` is fetched first, and then passed to the ``ad_iterator()`` +function which will yield ads in order of desirability. Each ad is then checked +using the ``ad_is_acceptable()`` function to determine if it meets the +advertiser's rules. + +The ``ad_is_acceptable()`` function then iterates over all the ``impressions`` +stored in the user profile, from most recent to oldest, within a certain +``thresold`` time period (shown here as 1 day). If the same ``ad_unit_id`` +appears in the mipression stream, the ad is rejected. Otherwise it is acceptable +and can be shown to the user. + +Index Support +````````````` + +In order to retrieve the user profile with the lowest latency possible, there +needs to be an index on the ``_id`` field, which MongoDB supplies by default. + +Sharding +~~~~~~~~ + +When sharding the ``ad.user`` collection, choosing the ``_id`` field as a +:term:`shard key` allows MongoDB to route queries and updates to the profile: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.user', { + ... 'key': {'_id': 1 } }) + { "collectionsharded": "ad.user", "ok": 1 } + + + diff --git a/source/applications/use-cases/cms-storing-comments.txt b/source/applications/use-cases/cms-storing-comments.txt index 89befea6576..81df945b698 100644 --- a/source/applications/use-cases/cms-storing-comments.txt +++ b/source/applications/use-cases/cms-storing-comments.txt @@ -689,7 +689,7 @@ at the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'comments', { - ... key : { 'discussion_id' : 1, 'full_slug': 1 } }) + ... 'key' : { 'discussion_id' : 1, 'full_slug': 1 } }) This will return the following response: diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 86b7c52d8db..b754bb89bd5 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -48,4 +48,5 @@ Online Advertising .. toctree:: :maxdepth: 2 + ad-serving-ads ad-campaign-management From 359a54d8df79013ade358b86cbdde2d0e58589a9 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 6 Apr 2012 17:59:41 -0400 Subject: [PATCH 09/13] Add keyword targeting to the ad network Signed-off-by: Rick Copeland --- .../applications/use-cases/ad-serving-ads.txt | 113 ++++++++++++++++-- 1 file changed, 103 insertions(+), 10 deletions(-) diff --git a/source/applications/use-cases/ad-serving-ads.txt b/source/applications/use-cases/ad-serving-ads.txt index 6d49a718c9d..8d3214d1633 100644 --- a/source/applications/use-cases/ad-serving-ads.txt +++ b/source/applications/use-cases/ad-serving-ads.txt @@ -93,7 +93,7 @@ profits: from random import choice def choose_ad(site_id, zone_id): - site = db.ad.inventory.find_one({ + site = db.ad.zone.find_one({ 'site_id': site_id, 'zone_id': zone_id}) if site is None: return None if len(site['ads']) == 0: return None @@ -109,7 +109,7 @@ to have a compound index on (``site_id``, ``zone_id``): .. code-block:: pycon - >>> db.ad.inventory.ensure_index([ + >>> db.ad.zone.ensure_index([ ... ('site_id', 1), ... ('zone_id', 1) ]) @@ -124,7 +124,7 @@ period. In this case, the logic is fairly straightforward: .. code-block:: python def deactivate_campaign(campaign_id): - db.ad.inventory.update( + db.ad.zone.update( { 'ads.campaign_id': campaign_id }, {' $pull': { 'ads', { 'campaign_id': campaign_id } } }, multi=True) @@ -141,22 +141,22 @@ In order to execute the multi-update quickly, you should maintain an index on th .. code-block:: pycon - >>> db.ad.inventory.ensure_index('ads.campaign_id') + >>> db.ad.zone.ensure_index('ads.campaign_id') Sharding ~~~~~~~~ In order to scale beyond the capacity of a single replica set, you will need to -shard the ``ad.inventory`` collection. To maintain the lowest possible latency in +shard the ``ad.zone`` collection. To maintain the lowest possible latency in the ad selection operation, the :term:`shard key` needs to be chosen to allow -MongoDB to route the ``ad.inventory`` query to a single shard. In this case, a +MongoDB to route the ``ad.zone`` query to a single shard. In this case, a good approach is to shard on the (``site_id``, ``zone_id``) combination: .. code-block:: pycon - >>> db.command('shardcollection', 'ad.inventory', { + >>> db.command('shardcollection', 'ad.zone', { ... 'key': {'site_id': 1, 'zone_id': 1} }) - { "collectionsharded": "ad.inventory", "ok": 1 } + { "collectionsharded": "ad.zone", "ok": 1 } Design 2: Adding Frequency Capping ---------------------------------- @@ -232,7 +232,7 @@ advertiser's targeting rules (in this case, the frequency cap): from datetime import datetime, timedelta def choose_ad(site_id, zone_id, user_id): - site = db.ad.inventory.find_one({ + site = db.ad.zone.find_one({ 'site_id': site_id, 'zone_id': zone_id}) if site is None or len(site['ads']) == 0: return None ads = ad_iterator(site['ads']) @@ -293,5 +293,98 @@ When sharding the ``ad.user`` collection, choosing the ``_id`` field as a ... 'key': {'_id': 1 } }) { "collectionsharded": "ad.user", "ok": 1 } +Design 3: Keyword Targeting +--------------------------- + +Where frequency capping above is an example of user profile targeting, you may +also wish to perform content targeting so that the user receives relevant ads for +the particular page being viewed. The simplest example of this is targeting ads +at the result of a search query. In this case, a list of ``keywords`` is sent to +the ``choose_ad()`` call along with the ``site_id``, ``zone_id``, and +``user_id``. + + +Schema Design +~~~~~~~~~~~~~ + +In order to choose relevant ads, you'll need to expand the ``ad.zone`` collection +to store keywords for each ad: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'search', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search1', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search2', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'search1', + keywords: [ 'car', 'performance' ], + ecpm: 200 }, + ... ] + } + +Choosing a Group of Ads to Serve +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the approach described here, you'll choose a number of ads that match the +keywords used in the search, so the code below has been modified to return an +iterator over ads in descending order of preference: + +.. code-block:: python + + def choose_ads(site_id, zone_id, user_id, keywords): + site = db.ad.zone.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return [] + ads = ad_iterator(site['ads'], keywords) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: return ads + advertiser_ids = ( + ad['campaign_id'].split(':', 1)[0] + for ad in ads ) + return ( + ad for ad, advertiser_id in izip( + ads, advertiser_ids) + if ad_is_acceptible(ad, user[advertiser_id]) ) + + def ad_iterator(ads, keywords): + '''Find available ads, sorted by score, with random sort for ties''' + keywords = set(keywords) + scored_ads = [ + (ad_score(ad, keywords), ad) + for ad in ads ] + score_groups = groupby( + sorted(scored_ads), key=lambda score, ad: score) + for score, ad_group in score_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_score(ad, keywords): + '''Compute a desirability score based on the ad ecpm and keywords''' + matching = set(ad['keywords']).intersection(keywords) + return ad['ecpm'] * math.log( + 1.1 + len(matching)) + + def ad_is_acceptible(ad, profile): + # same as above + +The main thing to note in the code above is that a must now be sorted according +to some ``score`` which in this case is computed based on a combination of the +``ecpm`` of the ad as well as the number of keywords matched. More advanced use +cases may boost the importance of various keywords, but this goes beyond the +scope of this use case. One thing to keep in mind is that the fact that ads are +now being sorted at ad display time may cause performance issues if there are are +large number of ads competing for the same display slot. + + - From afdb488fd22681550528455ca6643c7234acaea5 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Wed, 11 Apr 2012 19:11:01 -0400 Subject: [PATCH 10/13] Adding social graph and update document (partial, working through schema design) Signed-off-by: Rick Copeland --- source/applications/use-cases/index.txt | 8 + .../use-cases/social-user-profile.txt | 224 ++++++++++++++++++ 2 files changed, 232 insertions(+) create mode 100644 source/applications/use-cases/social-user-profile.txt diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index b754bb89bd5..53c4f96c0b6 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -50,3 +50,11 @@ Online Advertising ad-serving-ads ad-campaign-management + +Social Networking +----------------- + +.. toctree:: + :maxdepth: 2 + + social-user-profile diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt new file mode 100644 index 00000000000..5ad688ba2ad --- /dev/null +++ b/source/applications/use-cases/social-user-profile.txt @@ -0,0 +1,224 @@ +.. -*- rst -*- + +================================== +Social Networking: Storing Updates +================================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for a social network. In particular, this +document focuses on the task of storing and displaying user updates. + +Problem +~~~~~~~ + +You want to create an social network that will store profile information about +each user as well as allow the user to create various types of posts and updates +which will then be seen on their "friends'" walls. + +Solution +~~~~~~~~ + +The solution described below assumes a *directed* social graph where a user can +choose whether or not to follow another user. The solution is designed in such a +way as to minimize the number of documents that must be loaded in order to +display any given page, even at the expense of complicating updates. + +The particulars of what type of data you want ot host on your social network +obviously depends on the type of social network you are designing, and is largely +beyond the scope of this use case. In particular, the main variables that you +will have to consider in adapting this use case to your particular situation are: + +What data should be in a user profile + This may include gender, age, interests, relationship status, etc. for a + "casual" social network, or may include resume-type data for a more + "business-oriented" social network. +What type of updates are allowed + Again, depending on what flavor of social network you are designing, you may + wish to allow posts such as status updates, photos, links, checkins, and + polls, or you may wish to restrict your users to links and status updates. + +Schema Design +~~~~~~~~~~~~~ + +In the solution presented here, you will use two main "independent" collections +and two "dependent" collections to store user profile data and posts. + +Independent Collections +``````````````````````` + +The first +collection, ``social.user``, stores the social graph information for a given user +along with the user's profile data: + +.. code-block:: javascript + + { + _id: 'T4Y...AC', // base64-encoded ObjectId + name: 'Rick', + profile: { ... age, location, interests, etc. ... }, + followers: { + "T4Y...AD": { name: 'Jared' }, + "T4Y...AE": { name: 'Max' }, + "T4Y...AF": { name: 'Bernie' }, + "T4Y...AH": { name: 'Paul' }, + ... + ], + circles: { + work: { + "T4Y...AD": { name: 'Jared' }, + "T4Y...AE": { name: 'Max' }, + "T4Y...AF": { name: 'Bernie' }, + "T4Y...AH": { name: 'Paul' }, + ... }, + ...} + ] + } + +There are a few things ot note about this schema: + +- Rather than using a "raw" ``ObjectId`` for your ``_id`` field, you'll use a + base64-encoded version. This allows you to use ``_id`` values as keys in + subdocuments, which both reduces the memory footprint of these subdocuments as + well as speeding up some operations. +- The users being "followed" are broken into ``circles`` to facilitate sharing + with a subgroup. +- The ``followers`` subdocument is technically redundant, since it can be + computed from the ``circles`` property. Having ``followers`` available on the + ``social.user`` document, however, is useful both for displaying the user's + followers on the profile or "wall" page, as well as propagating posts to other + users, as you'll see below. +- The particular profile data stored for the user is isolated into the + ``profile`` subdocument, allowing you to evolve the schema as necessary without + worrying about introducing bugs into the social graph. + +Of course, to make the network interesting, it's necessary to add various types of +posts. These are stored in the ``social.post`` collection: + +.. code-block:: javascript + + { + _id: ObjectId(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'status', + ts: ISODateTime(...), + detail: { + text: 'Loving MongoDB' }, + comments: [ + { by: { id:"T4Y...AG", name: 'Dwight' }, + ts: ISODateTime(...), + text: 'Right on!' }, + ... all comments listed ... ] + } + +Here, the post stores the author information (``by``), the post ``type``, a +timestamp ``ts``, post details ``detail`` (which vary by post type), and a +``comments`` array. In this case, the schema embeds all comments on a post as a +time-sorted flat array. For a more in-depth exploration of the other approaches +to storing comments, please see the document +:doc:`CMS: Storing Comments `. + +One thing to note about the ``social.post`` collection is that it encapsulates +the polymorphic ``detail`` subdocument which would store different data for a +photo post versus a status update, for example. + +Dependent Collections +``````````````````````` + +social.wall (block of X most recent [partial] posts on user's wall) + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + num_posts: 42, + posts: [ + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'status', + detail: { text: 'Loving MongoDB' }, + comments: [ + { by: { id: "T4Y...AG", name: 'Dwight', + ts: ISODateTime(...), + text: 'Right on!' }, + ... only last X comments listed ... + ] + }, + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + type: 'checkin', + detail: { + text: 'Great office!', + geo: [ 40.724348,-73.997308 ], + name: '10gen Office', + photo: 'http://....' }, + comments: [ + { by: { id: "T4Y...AD", name: 'Jared' }, + ts: ISODateTime(...), + text: 'Wrong coast!' }, + ... only last X comments listed ... + ] + }, + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...g9", name: 'Rick' }, + type: 'status', + detail: { + text: 'So when do you crush Oracle?' }, + comments: [ + { by: { id: "T4Y...AE", name: 'Max' }, + ts: ISODateTime(...), + text: 'Soon... ;-)' }, + ... only last X comments listed ... + ] + }, + ] + } + +social.news (block of X most recent [partial] posts on user's news feed) + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + num_posts: 42, + posts: [ ... ] + } + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. From 954b714cd5f39e1909bd0676042c7082d309e1ec Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Thu, 12 Apr 2012 18:03:39 -0400 Subject: [PATCH 11/13] More details and operations Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 250 ++++++++++++++++-- 1 file changed, 222 insertions(+), 28 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index 5ad688ba2ad..d50382036da 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -24,20 +24,22 @@ Solution ~~~~~~~~ The solution described below assumes a *directed* social graph where a user can -choose whether or not to follow another user. The solution is designed in such a +choose whether or not to follow another user. Additionally, the user can +designate "circles" of users to follow, in order to facilitate fine-grained +control of privacy. The solution presented below is designed in such a way as to minimize the number of documents that must be loaded in order to display any given page, even at the expense of complicating updates. -The particulars of what type of data you want ot host on your social network +The particulars of what type of data you want to host on your social network obviously depends on the type of social network you are designing, and is largely beyond the scope of this use case. In particular, the main variables that you will have to consider in adapting this use case to your particular situation are: -What data should be in a user profile +What data should be in a user profile? This may include gender, age, interests, relationship status, etc. for a "casual" social network, or may include resume-type data for a more "business-oriented" social network. -What type of updates are allowed +What type of updates are allowed? Again, depending on what flavor of social network you are designing, you may wish to allow posts such as status updates, photos, links, checkins, and polls, or you may wish to restrict your users to links and status updates. @@ -46,7 +48,7 @@ Schema Design ~~~~~~~~~~~~~ In the solution presented here, you will use two main "independent" collections -and two "dependent" collections to store user profile data and posts. +and three "dependent" collections to store user profile data and posts. Independent Collections ``````````````````````` @@ -62,39 +64,43 @@ along with the user's profile data: name: 'Rick', profile: { ... age, location, interests, etc. ... }, followers: { - "T4Y...AD": { name: 'Jared' }, - "T4Y...AE": { name: 'Max' }, - "T4Y...AF": { name: 'Bernie' }, - "T4Y...AH": { name: 'Paul' }, + "T4Y...AD": { name: 'Jared', circles: [ 'python', 'authors'] }, + "T4Y...AF": { name: 'Bernie', circles: [ 'python' ] }, + "T4Y...AI": { name: 'Meghan', circles: [ 'python', 'speakers' ] }, ... ], circles: { - work: { + "10gen": { "T4Y...AD": { name: 'Jared' }, "T4Y...AE": { name: 'Max' }, "T4Y...AF": { name: 'Bernie' }, "T4Y...AH": { name: 'Paul' }, ... }, ...} - ] + }, + blocked: ['gh1...0d'], + pages: { wall: 4, news: 3 } } -There are a few things ot note about this schema: +There are a few things to note about this schema: - Rather than using a "raw" ``ObjectId`` for your ``_id`` field, you'll use a base64-encoded version. This allows you to use ``_id`` values as keys in subdocuments, which both reduces the memory footprint of these subdocuments as well as speeding up some operations. -- The users being "followed" are broken into ``circles`` to facilitate sharing - with a subgroup. -- The ``followers`` subdocument is technically redundant, since it can be - computed from the ``circles`` property. Having ``followers`` available on the - ``social.user`` document, however, is useful both for displaying the user's - followers on the profile or "wall" page, as well as propagating posts to other - users, as you'll see below. +- The social graph is stored bidirectionally in the ``followers`` and ``circles`` + collections. While this is technically redundant, having the bidirectional + connections is userful both for displaying the user's followers on the profile + page, as well as propagating posts to other users, as shown below. +- In addition to the normal "positive" social graph, the schema above also stores + a block list which contains an array of user ids for posters whose posts never + appear on the user's wall or news feed. - The particular profile data stored for the user is isolated into the ``profile`` subdocument, allowing you to evolve the schema as necessary without worrying about introducing bugs into the social graph. +- The ``pages`` property is used to store the number of pages in the + ``social.wall``, and ``social.news`` collections for this + particular user. These will be used below when creating new posts. Of course, to make the network interesting, it's necessary to add various types of posts. These are stored in the ``social.post`` collection: @@ -104,6 +110,7 @@ posts. These are stored in the ``social.post`` collection: { _id: ObjectId(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], type: 'status', ts: ISODateTime(...), detail: { @@ -115,32 +122,53 @@ posts. These are stored in the ``social.post`` collection: ... all comments listed ... ] } -Here, the post stores the author information (``by``), the post ``type``, a +Here, the post stores minimal author information (``by``), the post ``type``, a timestamp ``ts``, post details ``detail`` (which vary by post type), and a ``comments`` array. In this case, the schema embeds all comments on a post as a time-sorted flat array. For a more in-depth exploration of the other approaches to storing comments, please see the document :doc:`CMS: Storing Comments `. -One thing to note about the ``social.post`` collection is that it encapsulates -the polymorphic ``detail`` subdocument which would store different data for a -photo post versus a status update, for example. +A couple of points are worthy of further discussion: + +- Author information is truncated; just enough is stored in each ``by`` property + to display the author name and a link to the author profile. If your user + wants more detail on a particular author, you can fetch this information as + they request it. Storing minimal information like this helps keep the document + small (and therefore fast.) +- The visibility of the post is controlled via the ``circles`` property; any user + that is part of one of the listed circles can view the post. The special values + ``"\*public*"`` and ``"\*circles*"`` allow the user to share a post with the + whole world or with any users in any of the posting user's circles, respectively. +- Different types of posts may contain different types of data in the ``detail`` + field. Isolating this polymorphic information into a subdocument is a good + practice, helping you to clearly see which parts of the document are common to + all posts and which can vary. In this case, you would store different data for + a photo post versus a status update, while still keeping the metadata (``_id``, + ``by``, ``circles``, ``type``, ``ts``, and ``comments``) the same. Dependent Collections -``````````````````````` - -social.wall (block of X most recent [partial] posts on user's wall) +````````````````````` + +In addition to the independent collections above, for optimal performance you'll +need to create a few dependent collections that will be used to cache +information for display. The first of these collections is the ``social.wall`` +collection, and is intended to display a "wall" containing posts created by or +directed to a particular user. The format of the ``social.wall`` collection +follows. .. code-block:: javascript { _id: ObjectId(...), user_id: "T4Y...AE", + page: 4, num_posts: 42, posts: [ { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], type: 'status', detail: { text: 'Loving MongoDB' }, comments: [ @@ -153,6 +181,7 @@ social.wall (block of X most recent [partial] posts on user's wall) { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*circles*' ], type: 'checkin', detail: { text: 'Great office!', @@ -169,6 +198,7 @@ social.wall (block of X most recent [partial] posts on user's wall) { id: ObjectId(...), ts: ISODateTime(...), by: { id: "T4Y...g9", name: 'Rick' }, + circles: [ '10gen' ], type: 'status', detail: { text: 'So when do you crush Oracle?' }, @@ -179,16 +209,35 @@ social.wall (block of X most recent [partial] posts on user's wall) ... only last X comments listed ... ] }, + ... ] } -social.news (block of X most recent [partial] posts on user's news feed) +There are a few things to note about this schema: + +- Each post is listed with an abbreviated number of comments (3 might be + typical.) This is to keep the size of the document reasonable. If you need to + display more comments on a post, you would then query the ``social.post`` + collection for full details. +- There are actually multiple ``social.wall`` documents for each ``social.user`` + document. This allows the system to keep a "page" of recent posts in the + initial page view, fetching older "pages" if requested. A ``page`` property + keeps track of the position of this page of posts on the user's overall wall + timeline along with the timestamps on individual posts. +- Once again, the ``by`` properties store only the minimal author information for + display, helping to keep this document small. + +The other dependent collection you'll use is ``social.news``, posts from people +the user follows. This schema includes much of the same information as the +``social.wall`` information, so the document below has been abbreviated for +clarity: .. code-block:: javascript { _id: ObjectId(...), user_id: "T4Y...AE", + page: 3, num_posts: 42, posts: [ ... ] } @@ -196,12 +245,157 @@ social.news (block of X most recent [partial] posts on user's news feed) Operations ---------- -TODO: summary of the operations section +Since the schemas above optimize for read performance at the possible expense +of write performance, you should ideally provide a queueing system for +processing updates which may take longer than your desired web request latency. The examples that follow use the Python programming language and the :api:`PyMongo ` :term:`driver` for MongoDB, but you can implement this system using any language you choose. +Viewing a News Feed or Wall Posts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The most common operation on a social network probably the display of a +particular user's news feed, followed by a user's wall posts. Since the +``social.news`` and ``social.wall`` collections are optimized for these +operations, the query is fairly straightforward. Since these two collections +share a schema, viewing the posts for a news feed or a wall are actually quite +similar operations, and can be supported by the same code: + +.. code-block:: python + + def get_posts(collection, user_id, page=None): + spec = { 'user_id': viewed_user_id } + if page is not None: + spec['page'] = {'$lte': page} + cur = collection.find(spec) + cur = cur.sort('page', -1) + for page in cur: + for post in reversed(page['posts']): + yield page['page'], post + +The function ``get_posts`` above will retrieve all the posts on a particular user's +wall or news feed in reverse-chronological order. Some special handling is +required to efficieintly achieve the reverse-chronological ordering: + +- The ``posts`` within a page are actually stored in chronological order, so the + order of these posts must be reversed before displaying. +- As a user pages through her wall, it's preferable to avoid fetching the first + few pages from the server each time. To achieve this, the code above specifies + the first page to fetch in the ``page`` argument, passing this in as an + ``$lte`` expression in the query. +- Rather than only yielding the post itself, the post's page is also yielded from + the generator. This provides the ``page`` argument used in any subsequent calls + to ``get_posts``. + +There is one other issue that needs to be considered in selecting posts for +display: privacy settings. In order to handle privacy issues effectively, you'll +need use some filter functions on the posts generated above by ``get_posts``. The +first of these filters is used to determine whether to show a post when the user +is viewing his or her own wall: + +.. code-block:: python + + def visible_on_own_wall(user, post): + '''if poster is followed by user, post is visible''' + for circle, users in user['circles'].items(): + if post['by']['id'] in users: return True + return False + +In addition to the user's wall, your social network might provide an "incoming" +page that contains all posts directed towards a user regardless of whether that +poster is followed by the user. In this case, you would use a block list +to filter posts: + +.. code-block:: python + + def visible_on_own_incoming(user, post): + '''if poster is not blocked by user, post is visible''' + return post['by']['id'] not in user['blocked'] + +When viewing a news feed or another user's wall, the permission check is a bit +different based on the post's ``circles`` property: + +.. code-block:: python + + def visible_post(user, post): + if post['circles'] == ['*public*']: + # public posts always visible + return True + circles_user_is_in = set( + user['followers'].get(post['by']['id'] [])) + if not circles_user_is_in: + # user is not circled by poster; post is invisible + return False + if post['circles'] == ['*circles*']: + # post is public to all followed users; post is visible + return True + for circle in post['circles']: + if circle in circles_user_is_in: + # User is in a circle receiving this post + return True + return False + +Index Support +````````````` + +In order to quickly retrieve the pages in the desired order, you'll need an index +on (``user_id``, ``page``) in both the ``social.news`` and ``social.wall`` +collections. Since this combination is in fact unique, you should go ahead and +specify ``unique=True`` for the index (this will become important later). + +.. code-block:: pycon + + >>> db.social.news.ensure_index([ + ... ('user_id', 1), + ... ('page', -1)], + ... unique=True) + >>> db.social.wall.ensure_index([ + ... ('user_id', 1), + ... ('page', -1)], + ... unique=True) + + +Creating a New Post +~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + from datetime import datetime + POSTS_PER_PAGE=25 + + def post(user, dest_user, type, detail, circles): + ts = datetime.utcnow() + post = { + 'ts': ts, + 'by': { id: user['id'], name: user['name'] }, + 'circles': circles, + 'type': type, + 'detail': detail, + 'comments': [] } + # Update global post collection + db.social.post.insert(post) + if dest_user in user['followers'] + result = db.social.wall.update( + { 'user_id': user['id'], 'page': user['wall_pages'] } + + +Commenting on a Post +~~~~~~~~~~~~~~~~~~~~ + +Adding a User to a Circle +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Removing a User from a Circle +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Viewing a User's Profile +~~~~~~~~~~~~~~~~~~~~~~~~ + +Another common read operation on social networks is viewing a user's profile, +including their wall posts. The code is actually quite similar to the code for + Operation 1 ~~~~~~~~~~~ From 8bedbafaa568117f4a74e78a0754d47d35fface9 Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 13 Apr 2012 16:37:31 -0400 Subject: [PATCH 12/13] Finish draft of social networking doc Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 298 ++++++++++++++---- 1 file changed, 236 insertions(+), 62 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index d50382036da..df3aafeef8c 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -78,8 +78,7 @@ along with the user's profile data: ... }, ...} }, - blocked: ['gh1...0d'], - pages: { wall: 4, news: 3 } + blocked: ['gh1...0d'] } There are a few things to note about this schema: @@ -97,10 +96,6 @@ There are a few things to note about this schema: appear on the user's wall or news feed. - The particular profile data stored for the user is isolated into the ``profile`` subdocument, allowing you to evolve the schema as necessary without - worrying about introducing bugs into the social graph. -- The ``pages`` property is used to store the number of pages in the - ``social.wall``, and ``social.news`` collections for this - particular user. These will be used below when creating new posts. Of course, to make the network interesting, it's necessary to add various types of posts. These are stored in the ``social.post`` collection: @@ -162,8 +157,7 @@ follows. { _id: ObjectId(...), user_id: "T4Y...AE", - page: 4, - num_posts: 42, + month: '201204', posts: [ { id: ObjectId(...), ts: ISODateTime(...), @@ -171,14 +165,15 @@ follows. circles: [ '*public*' ], type: 'status', detail: { text: 'Loving MongoDB' }, + comments_shown: 3, comments: [ { by: { id: "T4Y...AG", name: 'Dwight', ts: ISODateTime(...), text: 'Right on!' }, - ... only last X comments listed ... + ... only last 3 comments listed ... ] }, - { id: ObjectId(...), + { id: ObjectId(...),s ts: ISODateTime(...), by: { id: "T4Y...AE", name: 'Max' }, circles: [ '*circles*' ], @@ -188,11 +183,12 @@ follows. geo: [ 40.724348,-73.997308 ], name: '10gen Office', photo: 'http://....' }, + comments_shown: 1, comments: [ { by: { id: "T4Y...AD", name: 'Jared' }, ts: ISODateTime(...), text: 'Wrong coast!' }, - ... only last X comments listed ... + ... only last 1 comment listed ... ] }, { id: ObjectId(...), @@ -202,11 +198,12 @@ follows. type: 'status', detail: { text: 'So when do you crush Oracle?' }, + comments_shown: 2, comments: [ { by: { id: "T4Y...AE", name: 'Max' }, ts: ISODateTime(...), text: 'Soon... ;-)' }, - ... only last X comments listed ... + ... only last 2 comments listed ... ] }, ... @@ -220,12 +217,13 @@ There are a few things to note about this schema: display more comments on a post, you would then query the ``social.post`` collection for full details. - There are actually multiple ``social.wall`` documents for each ``social.user`` - document. This allows the system to keep a "page" of recent posts in the - initial page view, fetching older "pages" if requested. A ``page`` property - keeps track of the position of this page of posts on the user's overall wall - timeline along with the timestamps on individual posts. + document, one wall document per month. This allows the system to keep a "page" of + recent posts in the initial page view, fetching older months if requested. - Once again, the ``by`` properties store only the minimal author information for display, helping to keep this document small. +- The number of comments on each post is stored to allow later updates to find + posts with more than a certain number of comments since the ``$size`` query + operator does not allow inequality comparisons. The other dependent collection you'll use is ``social.news``, posts from people the user follows. This schema includes much of the same information as the @@ -237,8 +235,7 @@ clarity: { _id: ObjectId(...), user_id: "T4Y...AE", - page: 3, - num_posts: 42, + month: '201204', posts: [ ... ] } @@ -265,29 +262,29 @@ similar operations, and can be supported by the same code: .. code-block:: python - def get_posts(collection, user_id, page=None): + def get_posts(collection, user_id, month=None): spec = { 'user_id': viewed_user_id } - if page is not None: - spec['page'] = {'$lte': page} + if month is not None: + spec['month'] = {'$lte': month} cur = collection.find(spec) - cur = cur.sort('page', -1) + cur = cur.sort('month', -1) for page in cur: for post in reversed(page['posts']): - yield page['page'], post + yield page['month'], post The function ``get_posts`` above will retrieve all the posts on a particular user's wall or news feed in reverse-chronological order. Some special handling is required to efficieintly achieve the reverse-chronological ordering: -- The ``posts`` within a page are actually stored in chronological order, so the +- The ``posts`` within a month are actually stored in chronological order, so the order of these posts must be reversed before displaying. - As a user pages through her wall, it's preferable to avoid fetching the first - few pages from the server each time. To achieve this, the code above specifies - the first page to fetch in the ``page`` argument, passing this in as an + few months from the server each time. To achieve this, the code above specifies + the first month to fetch in the ``month`` argument, passing this in as an ``$lte`` expression in the query. -- Rather than only yielding the post itself, the post's page is also yielded from - the generator. This provides the ``page`` argument used in any subsequent calls - to ``get_posts``. +- Rather than only yielding the post itself, the post's month is also yielded from + the generator. This provides the ``month`` argument to be used in any + subsequent calls to ``get_posts``. There is one other issue that needs to be considered in selecting posts for display: privacy settings. In order to handle privacy issues effectively, you'll @@ -341,32 +338,113 @@ Index Support ````````````` In order to quickly retrieve the pages in the desired order, you'll need an index -on (``user_id``, ``page``) in both the ``social.news`` and ``social.wall`` -collections. Since this combination is in fact unique, you should go ahead and -specify ``unique=True`` for the index (this will become important later). +on (``user_id``, ``month``) in both the ``social.news`` and ``social.wall`` +collections. .. code-block:: pycon - >>> db.social.news.ensure_index([ - ... ('user_id', 1), - ... ('page', -1)], - ... unique=True) - >>> db.social.wall.ensure_index([ - ... ('user_id', 1), - ... ('page', -1)], - ... unique=True) + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index([ + ... ('user_id', 1), + ... ('month', -1)]) + +Commenting on a Post +~~~~~~~~~~~~~~~~~~~~ + +Other than viewing walls and news feeds, creating new posts is the next most +common action taken on social networks. To create a comment by ``user`` on a +given ``post`` containing the given ``text``, you'll need to execute code similar +to the following: + +.. code-block:: python + + from datetime import datetime + + def comment(user, post_id, text): + ts = datetime.utcnow() + month = ts.strfime('%Y%m') + comment = { + 'by': { 'id': user['id'], 'name': user['name'] } + 'ts': ts, + 'text': text } + # Update the social.posts collection + db.social.post.update( + { '_id': post_id }, + { '$push': { 'comments': comment } } ) + # Update social.wall and social.news collections + db.social.wall.update( + { 'posts.id': post_id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + db.social.news.update( + { 'posts.id': _id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + +.. note:: + + One thing to note in this function is the presence of a couple of ``multi=True`` + update statements. Since these can potentially take quite a long time, this + function is a good candidate for processing 'out of band' with the regular + request-response flow of your application. + +The code above can actually result in an unbounded number of comments being +inserted into the ``social.wall`` and ``social.news`` collections. To compensate +for this, you should periodically run the following update statement to truncate +the number of displayed comments and keep the size of the news and wall documents +manageable.: + +.. code-block:: python + + COMMENTS_SHOWN = 3 + + def truncate_extra_comments(): + db.social.news.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) + db.social.wall.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) +Index Support +````````````` +In order to execute the updates to the ``social.news`` and ``social.wall`` +collections show above efficiently, you'll need to be able to quickly locate both +of the following types of documents: + +- Documents containing a given post +- Documents containing posts displaying too many comments + +To quickly execute these updates, then, you'll need to create the following +indexes: + +.. code-block:: pycon + + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index('posts.id') + ... collection.ensure_index('posts.comments_shown') Creating a New Post ~~~~~~~~~~~~~~~~~~~ +Creating a new post fills out the content-creation activities on a social +network: + .. code-block:: python from datetime import datetime - POSTS_PER_PAGE=25 def post(user, dest_user, type, detail, circles): ts = datetime.utcnow() + month = ts.strfime('%Y%m') post = { 'ts': ts, 'by': { id: user['id'], name: user['name'] }, @@ -376,43 +454,139 @@ Creating a New Post 'comments': [] } # Update global post collection db.social.post.insert(post) - if dest_user in user['followers'] - result = db.social.wall.update( - { 'user_id': user['id'], 'page': user['wall_pages'] } - + # Copy to dest user's wall + if user['id'] not in dest_user['blocked']: + append_post(db.social.wall, [dest_user['id']], month, post) + # Copy to followers' news feeds + if circles == ['*public*']: + dest_userids = set(user['followers'].keys()) + else: + dest_userids = set() + if circles == [ '*circles*' ]: + circles = user['circles'].keys() + for circle in circles: + dest_userids.update(user['circles'][circle]) + append_post(db.social.news, dest_userids, month, post) + +The basic sequence of operations in the code above is the following: + +#. The post first saved into the "system of record," the ``social.post`` + collection. +#. The recipient's wall is updatd with the post. +#. The news feeds of everyone who is 'circled' in the post is updated with the + post. + +Updating a particular wall or group of news feeds is then accomplished using the +``append_post`` function: -Commenting on a Post -~~~~~~~~~~~~~~~~~~~~ +.. code-block:: python -Adding a User to a Circle -~~~~~~~~~~~~~~~~~~~~~~~~~ + def append_post(collection, dest_userids, month, post): + collection.update( + { 'user_id': { '$in': sorted(dest_userids) }, + 'month': month }, + { '$push': { 'posts': post } }, + multi=True) -Removing a User from a Circle -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Index Support +````````````` + +In order to quickly update the ``social.wall`` and ``social.news`` collections, +you'll once again need an index on both ``user_id`` and ``month``. This time, +however, the optimal order on the indexes is (``month``, ``user_id``). This is +due to the fact that updates to these collections will always be for the current +month; having month appear first in the index makes the index *right-aligned*, +requiring significantly less memory to store the active part of the index. -Viewing a User's Profile -~~~~~~~~~~~~~~~~~~~~~~~~ +To actually create this index, you'll need to execute the following commands: -Another common read operation on social networks is viewing a user's profile, -including their wall posts. The code is actually quite similar to the code for +.. code-block:: pycon -Operation 1 -~~~~~~~~~~~ + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index([ + ... ('month', 1), + ... ('user_id', 1)]) -TODO: describe what the operation is (optional) -Query -````` +Maintaining the Social Graph +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -TODO: describe query +In your social network, maintaining the social graph is an infrequent but +essential operation. The code to add a user ``other`` to the current user +``self``\'s circles, you'll need to run the following function: + +.. code-block:: python + + def circle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$set': { circles_path: { 'name': other['name' ]} } }) + follower_circles = 'followers.%s.circles' % self['_id'] + follower_name = 'followers.%s.name' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$push': { follower_circles: circle }, + '$set': { follower_name: self['name'] } }) + +Note that in this solution, previous posts of the ``other`` user are not added to +the ``self`` user's news feed or wall. To actually include these past posts would +be an expensive and complex operation, and goes beyond the scope of this use case. + +Of course, you'll also need to support *removing* users from circles: + +.. code-block:: python + + def uncircle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$unset': { circles_path: 1 } }) + follower_circles = 'followers.%s.circles' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$pull': { follower_circles: circle } }) + # Special case -- 'other' is completely uncircled + db.social.user.update( + { '_id': other['_id'], follower_circles: {'$size': 0 } }, + { '$unset': { 'followers.' + self['_id' } }}) Index Support ````````````` -TODO: describe indexes to optimize this query +In both the circling and uncircling cases, the ``_id`` is included in the update +queries, so no additional indexes are required. Sharding -------- +In order to scale beyond the capacity of a single replica set, you will need to +shard each of the collections mentioned above. Since the ``social.user``, +``social.wall``, and ``social.news`` collections contain documents which are +specific to a given user, the user's ``_id`` field is an appropriate shard key: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'social.user', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.user", "ok": 1 } + >>> db.command('shardcollection', 'social.wall', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.wall", "ok": 1 } + >>> db.command('shardcollection', 'social.news', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.news", "ok": 1 } + +It turns out that using the posting user's ``_id`` is actually *not* the best +choice for a shard key for ``social.post``. This is due to the fact that queries +and updates to this table are done using the ``_id`` field, and sharding on +``by.id``, while tempting, would require these updates to be *broadcast* to all +shards. To shard the ``social.post`` collection on ``_id``, then, you'll need to +execute the following command: + + >>> db.command('shardcollection', 'social.post', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.post", "ok": 1 } + .. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki page. From 2a78298c152d9919057da73699ac62a79d06d40b Mon Sep 17 00:00:00 2001 From: Rick Copeland Date: Fri, 13 Apr 2012 21:24:47 -0400 Subject: [PATCH 13/13] Make a note indicating that the index on (month, user_id) is extraneous Signed-off-by: Rick Copeland --- .../use-cases/social-user-profile.txt | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt index df3aafeef8c..b94ecca4a3a 100644 --- a/source/applications/use-cases/social-user-profile.txt +++ b/source/applications/use-cases/social-user-profile.txt @@ -498,15 +498,12 @@ due to the fact that updates to these collections will always be for the current month; having month appear first in the index makes the index *right-aligned*, requiring significantly less memory to store the active part of the index. -To actually create this index, you'll need to execute the following commands: - -.. code-block:: pycon - - >>> for collection in (db.social.news, db.social.wall): - ... collection.ensure_index([ - ... ('month', 1), - ... ('user_id', 1)]) - +*However*, in this case, since you have already defined an index on (``user_id``, + ``month``), which *must* be in that order so that you can do the sort on + ``month``, adding a second index is unnecessary, and would end up actually using + more RAM to maintain two indexes. So even though this particular operation would + benefit from having an index on (``month``, ``user_id``), it's best to leave out + any additional indexes here. Maintaining the Social Graph ~~~~~~~~~~~~~~~~~~~~~~~~~~~~