diff --git a/source/applications/use-cases/ad-campaign-management.txt b/source/applications/use-cases/ad-campaign-management.txt new file mode 100644 index 00000000000..e99d36997cb --- /dev/null +++ b/source/applications/use-cases/ad-campaign-management.txt @@ -0,0 +1,149 @@ +.. -*- rst -*- + +======================================= +Online Advertising: Campaign Management +======================================= + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on creating and maintaining an advertising +campaign with a pre-set daily budget and cost per click (CPC) and cost per +thousand impressions (CPM) limit. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media. As part of this ad serving, you want to track which ads are +available to be served, based on both the daily budget and the CPC and CPM +limits. + +As part of a campaign, a customer creates one or more *zones*, where +each zone represents some location on a group of pages. Each zone in a campaign +then has one or more ads assigned to it + +Solution +~~~~~~~~ + +In this solution, you will store each campaign's metadata in its own document, +including budget, limits, targets, and ongoing statistics. This data can be +modified before the campaign starts or during the campaign itself. + +Schema Design +~~~~~~~~~~~~~ + +The schema for campaign management consists of a two collections, one which +stores campaign metadata, and another for campaign statistics. The campaign +metadata collection ``campaign.metadata`` documents have the following +format: + +.. code-block:: javascript + + { + _id: ObjectId(...), + customer_id: ObjectId(...), + title: "August Shoes Campaign", + begin: ISODate("2012-08-01T00:00:00Z"), + end: ISODate("2012-08-31T00:00:00Z"), + zones: { + z1: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'banner', + limit: { type: 'cpm', value: 2000 }, + ad_ids: [ 'ad1', 'ad2' ] }, + z2: { + site: 'cnn.com', + page: 'stories/shoes/.*', zone: 'tower-1a', + limit: { type: 'cpc', value: 45 }, + ad_ids: [ 'ad3' ] } }, + daily_budget: 25000 + } + +The statistics are stored in their own collection ``campaign.stats``: + +.. code-block:: javascript + + { + _id: ObjectId(...), // same as campaign ID + zones: { + z1: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10146, clicks: 198, conversions: 16 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9182, clicks: 183, conversions: 18 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9784, clicks: 202, conversions: 21 }, + ad1: { impressions: ... }, + ad2: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }, + z2: { + daily_stats: { + '2012-08-01': { + total: { impressions: 10457, clicks: 79, conversions: 14 }, + ad3: { impressions: ... } }, + '2012-08-02': { + total: { impressions: 9283, clicks: 53, conversions: 8 }, + ad3: { impressions: ... } }, + '2012-08-03': { + total: { impressions: 9197, clicks: 72, conversions: 14 }, + ad3: { impressions: ... } }, + ... + '2012-08-31': { + total: { impressions: 0, clicks: 0, conversions: 0 }, + ad1: { impressions: 0, clicks: 0, conversions: 0 }, + ad2: { impressions: 0, clicks: 0, conversions: 0 } } } + }], + daily_spent: { + '2012-08-01': 23847, + '2012-08-02': 20749, + ... + '2012-08-12': 0, + '2012-08-13': 0, + ... + '2012-08-31': 0 } + } + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/ad-serving-ads.txt b/source/applications/use-cases/ad-serving-ads.txt new file mode 100644 index 00000000000..8d3214d1633 --- /dev/null +++ b/source/applications/use-cases/ad-serving-ads.txt @@ -0,0 +1,390 @@ +.. -*- rst -*- + +============================== +Online Advertising: Ad Serving +============================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for an online advertising network. In +particular, this document focuses on the task of deciding *which* ad to serve +when a user visits a particular site. + +Problem +~~~~~~~ + +You want to create an advertising network that will serve ads to a variety of +online media sites. As part of this ad serving, you want to track which ads are +available to be served, and decide on a particular ad to be served in a +particular zone. + +Solution +~~~~~~~~ + +This solution is structured as a progressive refinement of the ad network, +starting out with the basic data storage requirements and adding more advanced +features to the schema to support more advanced ad targeting. The key performance +criterion for this solution is the latency between receiving an ad request and +returning the (targeted) ad to be displayed. + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Design 1: Basic Ad Serving +-------------------------- + +A basic ad serving algorithm consists of the following steps: + +#. The network receives a request for an ad, specifying at a minimum the + ``site_id`` and ``zone_id`` to be served. +#. The network consults its inventory of ads available to display and chooses an + ad based on various business rules. +#. The network returns the actual ad to be displayed, possibly recording the + decision made as well. + +This design uses the ``site_id`` and ``zone_id`` submitted with the ad request, +as well as information stored in the ad inventory collection, to make the ad +targeting decisions. Later examples will build on this, allowing more advanced ad +targeting. + +Schema Design +~~~~~~~~~~~~~ + +A very basic schema for storing ads available to be served consists of a single +collection, ``ad.zone``: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'banner', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23a', + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'banner23b', + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'banner12', + ecpm: 200 }, + ... ] + } + +Note that for each (``site``, ``zone``) combination you'll be storing a list of +ads, sorted by their ``ecpm`` values. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve selects a compatible ad and +sorts by the advertiser's ``ecpm`` bid in order to maximize the ad network's +profits: + +.. code-block:: python + + from itertools import groupby + from random import choice + + def choose_ad(site_id, zone_id): + site = db.ad.zone.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return None + if len(site['ads']) == 0: return None + ecpm_groups = groupby(site['ads'], key=lambda ad:ad['ecpm']) + ecpm, ad_group = ecpm_groups.next() + return choice(list(ad_group)) + +Index Support +````````````` + +In order to execute the ad choice with the lowest latency possible, you'll want +to have a compound index on (``site_id``, ``zone_id``): + +.. code-block:: pycon + + >>> db.ad.zone.ensure_index([ + ... ('site_id', 1), + ... ('zone_id', 1) ]) + +Making an Ad Campaign Inactive +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One case you'll have to deal with in this solution making a campaign +inactive. This may happen for a variety of reasons. For instance, the campaign +may have reached its end date or exhausted its budget for the current time +period. In this case, the logic is fairly straightforward: + +.. code-block:: python + + def deactivate_campaign(campaign_id): + db.ad.zone.update( + { 'ads.campaign_id': campaign_id }, + {' $pull': { 'ads', { 'campaign_id': campaign_id } } }, + multi=True) + +The update statement above first selects only those ad zones which had avaialable +ads from the given ``campaign_id`` and then uses the ``$pull`` modifier to remove +them from rotation. + +Index Support +````````````` + +In order to execute the multi-update quickly, you should maintain an index on the +``ads.campaign_id`` field: + +.. code-block:: pycon + + >>> db.ad.zone.ensure_index('ads.campaign_id') + +Sharding +~~~~~~~~ + +In order to scale beyond the capacity of a single replica set, you will need to +shard the ``ad.zone`` collection. To maintain the lowest possible latency in +the ad selection operation, the :term:`shard key` needs to be chosen to allow +MongoDB to route the ``ad.zone`` query to a single shard. In this case, a +good approach is to shard on the (``site_id``, ``zone_id``) combination: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.zone', { + ... 'key': {'site_id': 1, 'zone_id': 1} }) + { "collectionsharded": "ad.zone", "ok": 1 } + +Design 2: Adding Frequency Capping +---------------------------------- + +One problem with the logic described in Design 1 above is that it will tend to +display the same ad over and over again until the campaign's budget is +exhausted. To mitigate this, advertisers may wish to limit the frequency with +which a given user is presented a particular ad. This process is called frequency +capping and is an example of user profile targeting in advertising. + +In order to perform frequency capping (or any type of user targeting), the ad +network needs to maintain a profile for each visitor, typically implemented as a +cookie in the user's browser. This cookie, effectively a ``user_id``, is then +transmitted to the ad network when logging impressions, clicks, conversions, +etc., as well as the ad serving decision. This section focuses on how that +profile data impacts the ad serving decision. + +Schema Design +~~~~~~~~~~~~~ + +In order to use the user profile data, you need to store it. In this case, it's +stored in a collection ``ad.user``: + +.. code-block:: javascript + + { + _id: 'cookie_value', + advertisers: { + mercedes: { + impressions: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + clicks: [ + { date: ISODateTime(...), + campaign: 'c201204_sclass_4', + ad_unit_id: 'banner23a', + site_id: 'cnn', + zone_id: 'banner' } }, + ... ], + bmw: [ ... ], + ... + } + } + +There are a few things to note about the user profile: + +- Profile information is segmented by advertiser. Typically advertising data is + sensitive competitive infomration that can't be shared among advertisers, so + this must be kept separate. +- All data is embedded in a single profile document. When you need to query this + data (detailed below), you don't necessarily know which advertiser's ads you'll + be showing, so it's a good practice to embed all advertisers in a single + document. +- The event information is grouped by event type within an advertiser, and sorted + by timestamp. This allows rapid lookups of a stream of a particular type of + event. + +Choosing an Ad to Serve +~~~~~~~~~~~~~~~~~~~~~~~ + +The query you'll use to choose which ad to serve now needs to iterate through +ads in order of desireability and select the "best" ad that also satisfies the +advertiser's targeting rules (in this case, the frequency cap): + +.. code-block:: python + + from itertools import groupby + from random import shuffle + from datetime import datetime, timedelta + + def choose_ad(site_id, zone_id, user_id): + site = db.ad.zone.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None or len(site['ads']) == 0: return None + ads = ad_iterator(site['ads']) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: + # any ad is acceptable for an unknown user + return ads.next() + for ad in ads: + advertiser_id = ad['campaign_id'].split(':', 1)[0] + if ad_is_acceptable(ad, user[advertiser_id]): + return ad + return None + + def ad_iterator(ads): + '''Find available ads, sorted by ecpm, with random sort for ties''' + ecpm_groups = groupby(ads, key=lambda ad:ad['ecpm']) + for ecpm, ad_group in ecpm_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_is_acceptable(ad, profile): + '''Returns False if the user has seen the ad today''' + threshold = datetime.utcnow() - timedelta(days=1) + for event in reversed(profile['impressions']): + if event['timestamp'] < threshold: break + if event['detail']['ad_unit_id'] == ad['ad_unit_id']: + return False + return True + +Here, the ``chose_ad()`` function provides the framework for your ad selection +process. The ``site`` is fetched first, and then passed to the ``ad_iterator()`` +function which will yield ads in order of desirability. Each ad is then checked +using the ``ad_is_acceptable()`` function to determine if it meets the +advertiser's rules. + +The ``ad_is_acceptable()`` function then iterates over all the ``impressions`` +stored in the user profile, from most recent to oldest, within a certain +``thresold`` time period (shown here as 1 day). If the same ``ad_unit_id`` +appears in the mipression stream, the ad is rejected. Otherwise it is acceptable +and can be shown to the user. + +Index Support +````````````` + +In order to retrieve the user profile with the lowest latency possible, there +needs to be an index on the ``_id`` field, which MongoDB supplies by default. + +Sharding +~~~~~~~~ + +When sharding the ``ad.user`` collection, choosing the ``_id`` field as a +:term:`shard key` allows MongoDB to route queries and updates to the profile: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'ad.user', { + ... 'key': {'_id': 1 } }) + { "collectionsharded": "ad.user", "ok": 1 } + +Design 3: Keyword Targeting +--------------------------- + +Where frequency capping above is an example of user profile targeting, you may +also wish to perform content targeting so that the user receives relevant ads for +the particular page being viewed. The simplest example of this is targeting ads +at the result of a search query. In this case, a list of ``keywords`` is sent to +the ``choose_ad()`` call along with the ``site_id``, ``zone_id``, and +``user_id``. + + +Schema Design +~~~~~~~~~~~~~ + +In order to choose relevant ads, you'll need to expand the ``ad.zone`` collection +to store keywords for each ad: + +.. code-block:: javascript + + { + _id: ObjectId(...), + site_id: 'cnn', + zone_id: 'search', + ads: [ + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search1', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'mercedes:c201204_sclass_4', + ad_unit_id: 'search2', + keywords: [ 'car', 'luxury', 'style' ], + ecpm: 250 }, + { campaign_id: 'bmw:c201204_eclass_1', + ad_unit_id: 'search1', + keywords: [ 'car', 'performance' ], + ecpm: 200 }, + ... ] + } + +Choosing a Group of Ads to Serve +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the approach described here, you'll choose a number of ads that match the +keywords used in the search, so the code below has been modified to return an +iterator over ads in descending order of preference: + +.. code-block:: python + + def choose_ads(site_id, zone_id, user_id, keywords): + site = db.ad.zone.find_one({ + 'site_id': site_id, 'zone_id': zone_id}) + if site is None: return [] + ads = ad_iterator(site['ads'], keywords) + user = db.ad.user.find_one({'user_id': user_id}) + if user is None: return ads + advertiser_ids = ( + ad['campaign_id'].split(':', 1)[0] + for ad in ads ) + return ( + ad for ad, advertiser_id in izip( + ads, advertiser_ids) + if ad_is_acceptible(ad, user[advertiser_id]) ) + + def ad_iterator(ads, keywords): + '''Find available ads, sorted by score, with random sort for ties''' + keywords = set(keywords) + scored_ads = [ + (ad_score(ad, keywords), ad) + for ad in ads ] + score_groups = groupby( + sorted(scored_ads), key=lambda score, ad: score) + for score, ad_group in score_groups: + ad_group = list(ad_group) + shuffle(ad_group) + for ad in ad_group: yield ad + + def ad_score(ad, keywords): + '''Compute a desirability score based on the ad ecpm and keywords''' + matching = set(ad['keywords']).intersection(keywords) + return ad['ecpm'] * math.log( + 1.1 + len(matching)) + + def ad_is_acceptible(ad, profile): + # same as above + +The main thing to note in the code above is that a must now be sorted according +to some ``score`` which in this case is computed based on a combination of the +``ecpm`` of the ad as well as the number of keywords matched. More advanced use +cases may boost the importance of various keywords, but this goes beyond the +scope of this use case. One thing to keep in mind is that the fact that ads are +now being sorted at ad display time may cause performance issues if there are are +large number of ads competing for the same display slot. + + + diff --git a/source/applications/use-cases/cms-metadata-and-asset-management.txt b/source/applications/use-cases/cms-metadata-and-asset-management.txt index e3a4886c3e0..c2693f8e7be 100644 --- a/source/applications/use-cases/cms-metadata-and-asset-management.txt +++ b/source/applications/use-cases/cms-metadata-and-asset-management.txt @@ -444,7 +444,8 @@ on the ``_id`` field (none of the node metadata is available on the .. code-block:: python - >>> db.command('shardcollection', 'cms.assets.chunks' + >>> db.command('shardcollection', 'cms.assets.chunks', { + ... key: { '_id': 1 } }) { "collectionsharded" : "cms.assets.chunks", "ok" : 1 } This actually still maintains the query-routability constraint, since diff --git a/source/applications/use-cases/cms-storing-comments.txt b/source/applications/use-cases/cms-storing-comments.txt index c4c73821e07..81df945b698 100644 --- a/source/applications/use-cases/cms-storing-comments.txt +++ b/source/applications/use-cases/cms-storing-comments.txt @@ -689,7 +689,7 @@ at the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'comments', { - ... key : { 'discussion_id' : 1, 'full_slug': 1 } }) + ... 'key' : { 'discussion_id' : 1, 'full_slug': 1 } }) This will return the following response: @@ -711,9 +711,6 @@ at the Python/PyMongo console: >>> db.command('shardcollection', 'comment_pages', { ... key : { 'discussion_id' : 1, 'page': 1 } }) + { "collectionsharded" : "comment_pages", "ok" : 1 } -This will return the following response: -.. code-block:: javascript - - { "collectionsharded" : "comment_pages", "ok" : 1 } diff --git a/source/applications/use-cases/ecommerce-category-hierarchy.txt b/source/applications/use-cases/ecommerce-category-hierarchy.txt index 9b03bae98f8..f7996c3fd70 100644 --- a/source/applications/use-cases/ecommerce-category-hierarchy.txt +++ b/source/applications/use-cases/ecommerce-category-hierarchy.txt @@ -242,8 +242,7 @@ the category collection would then be the following: .. code-block:: python - >>> db.command('shardcollection', 'categories') + >>> db.command('shardcollection', 'categories', { + ... key: {'_id': 1} }) { "collectionsharded" : "categories", "ok" : 1 } -Note that there is no need to specify the shard key, as MongoDB will -default to using ``_id`` as a shard key. diff --git a/source/applications/use-cases/ecommerce-inventory-management.txt b/source/applications/use-cases/ecommerce-inventory-management.txt index e718a9606ac..9d525c1a541 100644 --- a/source/applications/use-cases/ecommerce-inventory-management.txt +++ b/source/applications/use-cases/ecommerce-inventory-management.txt @@ -398,15 +398,12 @@ minimize server load. The sharding commands you'd use to shard the cart and inventory collections, then, would be the following: -.. code-block:: python - - db.command('shardcollection', 'inventory') - db.command('shardcollection', 'cart') - -.. code-block:: javascript +.. code-block:: pycon + >>> db.command('shardcollection', 'inventory', { + ... 'key': {'_id': 1} }) { "collectionsharded" : "inventory", "ok" : 1 } + >>> db.command('shardcollection', 'cart', { + ... 'key': {'_id': 1} }) { "collectionsharded" : "cart", "ok" : 1 } -Note that there is no need to specify the shard key, as MongoDB will -default to using ``_id`` as a shard key. diff --git a/source/applications/use-cases/ecommerce-product-catalog.txt b/source/applications/use-cases/ecommerce-product-catalog.txt index 578dab5bb0d..c7572d87c8b 100644 --- a/source/applications/use-cases/ecommerce-product-catalog.txt +++ b/source/applications/use-cases/ecommerce-product-catalog.txt @@ -7,7 +7,7 @@ E-Commerce: Product Catalog Overview -------- -This document describes the basic patterns and principals for +This document describes the basic patterns and principles for designing an E-Commerce product catalog system using MongoDB as a storage engine. @@ -494,11 +494,6 @@ Python/PyMongo console: >>> db.command('shardcollection', 'product', { ... key : { 'type': 1, 'details.genre' : 1, 'sku':1 } }) - -Upon success, you will see the following response: - -.. code-block:: javascript - { "collectionsharded" : "details.genre", "ok" : 1 } .. note:: diff --git a/source/applications/use-cases/gaming-user-state.txt b/source/applications/use-cases/gaming-user-state.txt new file mode 100644 index 00000000000..a6df1abc4a5 --- /dev/null +++ b/source/applications/use-cases/gaming-user-state.txt @@ -0,0 +1,561 @@ +=========================================== +Online Gaming: Creating a Role-Playing Game +=========================================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principles for using +MongoDB as a persistent storage engine for an online +game, particularly one that contains role-playing characteristics. + +Problem +~~~~~~~ + +In designing an online game, there is a need to store various +data about the player's character. Some of the attributes might include: + +Character attributes + These might include intrinsic characteristics such as strength, + dexterity, charisma, etc., as well as variable characteristics such + as health, mana (if your game includes magic), etc. +Character inventory + If your game includes the ability for the player to carry around + objects, you will need to keep track of the items carried. +Character location / relationship to the game world + If your game allows the player to move their character from one + location to another, this information needs to be stored as well. + +In addition, you need to store all this data for large numbers of +playerss who might be playing the game simultaneously, and this data +needs to be both readable and writeable with minimal latency in order +to ensure responsiveness during gameplay. + +In addition to the above data, you also need to store data for + +Items + These include various artifacts that the character might interact with such as + weapons, armor, treasure, etc. +Locations + The various locations in which characters and items might find themselves such + as rooms, halls, etc. + +Another consideration when designing the persistence backend for an +online game is its flexibility. Particularly in early releases of a +game, you may wish to change gameplay mechanics significantly as you +receive feedback from your players. As you implement these changes, you +need to be able to migrate your persistent data from one format to +another with minimal (or no) downtime. + +Solution +~~~~~~~~ + +The solution presented by this case study assumes that the read and +write performance is equally important and must be accessible with minimal +latency. + +Schema Design +~~~~~~~~~~~~~ + +Ultimately, the particulars of your schema depends on the particular +design of your game. When designing your schema, you should attempt to +encapsulate all the commonly used data into a small number of objects in order to +minimize the number of queries to the database and the number of seeks in a +query. Encapsulating all player state into a ``character`` collection, item data +into an ``item`` collection, and location data into a ``location`` collection +satisfies both these criteria. + +Character Schema +```````````````` + +In a role-playing game, then, a typical character state document might look +like the following: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'Tim', + character: { + intrinsics: { + strength: 10, + dexterity: 16, + intelligence: 17, + charisma: 8 }, + class: 'mage', + health: 212, + mana: 152 + }, + location: { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } + ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' }] + }, + gold: 523, + armor: [ + { id:ObjectId('...'), region:'head'}, + { id:ObjectId('...'), region:'body'}, + { id:ObjectId('...'), region:'feet'}], + weapons: [ {id:ObjectId('...'), hand:'both'} ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'backpack', inventory: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + { qty:1, id:ObjectId('...'), name:"wizard's hat", bonus:3}, + { qty:1, id:ObjectId('...'), name:"wizard's robe", bonus:0}, + { qty:1, id:ObjectId('...'), name:"old boots", bonus:0}, + { qty:1, id:ObjectId('...'), name:"quarterstaff", bonus:2} ] + } + +There are a few things to note about this document: + +#. Information about the character's location in the game is encapsulated under + the ``location`` attribute. Note in particular that all of the information + necessary to render the room is encapsulated within the character state + document. This allows the game system to render the room without making a + second #query to the database to get room information. +#. The ``armor`` and ``weapons`` attributes contain little information about the + actual items being worn or carried. This information is actually stored under + the ``inventory`` property. Since the inventory information is stored in the + same document, there is no need to replicate the detailed information about + each item into the ``armor`` and ``weapons`` properties. +#. ``inventory`` contains the item details necessary for + rendering each item in the character's posession, including anyenchantments + (``bonus``) and ``quantity``. Once again, embedding this data into the + character record means you don't have to perform a separate query to fetch + item details necessary for display. + +Item Schema +``````````` + +Likewise, the item schema should include all details about all items globally in +the game: + +.. code-block:: javascript + + { + _id: ObjectId('...'), + name: 'backpack', + bonus: null, + inventory: [ + { qty:4, id:ObjectId('...'), name: 'potion of healing'}, + { qty:1, id:ObjectId('...'), name: 'scroll of magic mapping'}, + { qty:2, id:ObjectId('...'), name: 'c-rations'} ]}, + weight: 12, + price: 160, + ... + } + +Note that this document contains more or less the same information as stored in +the ``inventory`` attribute of ``character`` documents, as well as additional +data which may only be needed sporadically in the case of game-play such as +``weight`` and ``price``. + +Location Schema +``````````````` + +Finally, the ``location`` schema specifies the state of the world in the game: + +.. code-block:: javascript + + { + id: 'maze-1', + description: 'a maze of twisty little passages...', + exits: {n:'maze-2', s:'maze-1', e:'maze-3'}, + players: [ + { id:ObjectId('...'), name:'grue' }, + { id:ObjectId('...'), name:'Tim' } ], + inventory: [ + { qty:1, id:ObjectId('...'), name:'scroll of cause fear' } ], + } + +Here, note that ``location`` stores exactly the same information as is stored in +the ``location`` attribute of the ``character`` document. You will use +``location`` as the system of record when the game requires interaction between +multiple characters or between characters and non-inventory items. + +Operations +---------- + +In an online gaming system with the state embedded in a single document for +``character``, ``item``, and ``location``, the primary operations you'll be +performing are querying for the character state by ``_id``, extracting relevant +data for display, and updating various attributes about the character. This +section describes procedures for performing these queries, extractions, and +updates. + +In particular you should try *not* to load the ``location`` or ``item`` documents +except when absolutely necessary. + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Load Character Data from MongoDB +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The most basic operation in this system is loading the character state. + +Query +````` + +Use the following query to load the ``character`` document from MongoDB: + +.. code-block:: pycon + + >>> character = db.characters.find_one({'_id': character_id}) + +Index Support +````````````` + +In this case, the default index that MongoDB supplies on the ``_id`` field is +sufficient for good performance of this query. + +Extract Armor and Weapon Data for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to save space, the ``character`` schema described above stores item +details only in the ``inventory`` attribute, storing ``ObjectId``\ s in other +locations. To display these item details, as on a character summary window, you +need to merge the information from the ``armor`` and ``weapons`` attributes with +information from the ``inventory`` attribute. + +Suppose, for instance, that your code is displaying the armor data using the +following Jinja2 template: + +.. code-block:: html + +
+

Armor

+
+ {% if value.head %} +
Helmet
+
{{value.head[0].description}}
+ {% endif %} + {% if value.hands %} +
Gloves
+
{{value.hands[0].description}}
+ {% endif %} + {% if value.feet %} +
Boots
+
{{value.feet[0].description}}
+ {% endif %} + {% if value.body %} +
Body Armor
+
    {% for piece in value.body %} +
  • piece.description
  • + {% endfor %}
+ {% endif %} +
+ + +In this case, you want the various ``description`` fields above to be text +similar to "+3 wizard's hat." The context passed to the template above, then, +would be of the following form: + +.. code-block:: python + + { + "head": [ { "id":..., "description": "+3 wizard's hat" } ], + "hands": [], + "feet": [ { "id":..., "description": "old boots" } ], + "body": [ { "id":..., "description": "wizard's robe" } ], + } + +In order to build up this structure, use the following helper functions: + +.. code-block:: python + + def get_item_index(inventory): + '''Given an inventory attribute, recursively build up an item + index (including all items contained within other items) + ''' + + result = {} + for item in inventory: + result[item['_id']] = item + if 'inventory' in item: + result.update(get_item_index(item['inventory])) + return result + + def describe_item(item): + '''Add a 'description' field to the given item''' + + result = dict(item) + if item['bonus']: + description = '%+d %s' % (item['bonus'], item['name']) + else: + description = item['name'] + result['description'] = description + return result + + def get_armor_for_display(character, item_index): + '''Given a character document, return an 'armor' value + suitable for display''' + + result = dict(head=[], hands=[], feet=[], body=[]) + for piece in character['armor']: + item = describe_item(item_index[piece['id']]) + result[piece['region']].append(item) + return result + +In order to actually display the armor, then, you would use the following code: + +.. code-block:: pycon + + >>> item_index = get_item_index( + ... character['inventory'] + character['location']['inventory']) + >>> armor = get_armor_for_dislay(character, item_index) + +Note in particular that you are building an index not only for the items the +character is actually carrying in inventory, but also for the items that the +player might interact with in the room. + +Similarly, in order to display the weapon information, you need to build a +structure such as the following: + +.. code-block:: python + + { + "left": None, + "right": None, + "both": { "description": "+2 quarterstaff" } + } + +The helper function is similar to that for ``get_armor_for_display``: + +.. code-block:: python + + def get_weapons_for_display(character, item_index): + '''Given a character document, return a 'weapons' value + suitable for display''' + + result = dict(left=None, right=None, both=None) + for piece in character['weapons']: + item = describe_item(item_index[piece['id']]) + result[piece['hand']] = item + return result + +In order to actually display the weapons, then, you would use the following code: + +.. code-block:: pycon + + >>> armor = get_weapons_for_dislay(character, item_index) + +Extract Character Attributes, Inventory, and Room Information for Display +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to display information about the character's attributes, inventory, and +surroundings, you also need to extract fields from the character state. In this +case, however, the schema defined above keeps all the relevant information for +display embedded in those sections of the document. The code for extracting this +data, then, is the following: + +.. code-block:: pycon + + >>> attributes = character['character'] + >>> inventory = character['inventory'] + >>> room_data = character['location'] + +Pick Up an Item From a Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the player decides to pick up an item from the room and add +it to their inventory. In this case, you need to update both the character state +and the global location state: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location.inventory': { '_id': item['id'] } } }) + db.location.update( + { '_id': character['location']['id'] }, + { '$pull': { 'inventory': { 'id': item_id } } }) + +While the above code may be for a single-player game, if you allow multiple +players, or non-player characters, to pick up items, that introduces a problem in +the above code where two characters may try to pick up an item simultaneously. To +guard against that, use the ``location`` collection to decide between ties. In +this case, the code is now the following: + +.. code-block:: python + + def pick_up_item(character, item_index, item_id): + '''Transfer an item from the current room to the character's inventory''' + + item = item_index[item_id] + character['inventory'].append(item) + result = db.location.update( + { '_id': character['location']['id'], + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item }, + '$pull': { 'location': { '_id': item['id'] } } }) + +By ensuring that the item is present before removing it from the room in the +``update`` call above, you guarantee that only one player/non-player +character/monster can pick up the item. + +Remove an Item from a Container +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the game described here, the ``backpack`` item can contain other +items. You might further suppose that some other items may be similarly +hierarchical (e.g. a chest in a room). Suppose that the player wishes to move an +item from one of these "containers" into their active ``inventory`` as a prelude +to using it. In this case, you need to update both the character state and the +item state: + +.. code-block:: python + + def move_to_active_inventory(character, item_index, container_id, item_id): + '''Transfer an item from the given container to the character's active + inventory + ''' + + result = db.item.update( + { '_id': container_id, + 'inventory.id': item_id }, + { '$pull': { 'inventory': { 'id': item_id } } }, + safe=True) + if not result['updatedExisting']: + raise Conflict() + item = item_index[item_id] + container = item_index[item_id] + character['inventory'].append(item) + container['inventory'] = [ + item for item in container['inventory'] + if item['_id'] != item_id ] + db.character.update( + { '_id': character['_id'] }, + { '$push': { 'inventory': item } } ) + db.character.update( + { '_id': character['_id'], 'inventory.id': container_id }, + { '$pull': { 'inventory.$.inventory': { 'id': item_id } } } ) + +Note in the code above that you: + +- Ensure that the item's state makes this update reasonable (the item is + actually contained within the container). Abort with an error if this is not + true. +- Update the in-memory ``character`` document's inventory, adding the item. +- Update the in-memory ``container`` document's inventory, removing the item. +- Update the ``character`` document in MongoDB. +- In the case that the character is moving an item from a container *in his own + inventory*, update the character's inventory representation of the container. + +Move the Character to a Different Room +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your game, suppose the player decides to move north. In this case, you need to +update the character state to match the new location: + +.. code-block:: python + + def move(character, direction): + '''Move the character to a new location''' + + # Remove character from current location + db.location.update( + {'_id': character['location']['id'] }, + {'$pull': {'players': {'id': character['_id'] } } }) + # Add character to new location, retrieve new location data + new_location = db.location.find_and_modify( + { '_id': character['location']['exits'][direction] }, + { '$push': { 'players': { + 'id': character['_id'], + 'name': character['name'] } } }, + new=True) + character['location'] = new_location + db.character.update( + { '_id': character['_id'] }, + { '$set': { 'location': new_location } }) + +Here, note that the code updates the old room, the new room, and the character +document. + +Buy an Item +~~~~~~~~~~~ + +If your character wants to buy an item, you need to add that item to the +character's inventory, decrement the character's gold, increment the shopkeeper's +gold, and update the room: + +.. code-block:: python + + def buy(character, shopkeeper, item_id): + '''Pick up an item, add to the character's inventory, and transfer + payment to the shopkeeper + ''' + + price = db.item.find_one({'_id': item_id}, {'price':1})['price'] + result = db.character.update( + { '_id': character['_id'], + 'gold': { '$gte': price } }, + { '$inc': { 'gold': -price } }, + safe=True ) + if not result['updatedExisting']: + raise InsufficientFunds() + try: + pick_up_item(character, item_id) + except: + # Add the gold back to the character + result = db.character.update( + { '_id': character['_id'] }, + { '$inc': { 'gold': price } } ) + raise + character['gold'] -= price + db.character.update( + { '_id': shopkeeper['_id'] }, + { '$inc': { 'gold': price } } ) + +Note that the code above ensures that the character has sufficent gold to pay for +the item using the ``updatedExisting`` trick used for picking up items. The race +condition for item pickup is handled as well, "rolling back" the removal of gold +from the character's wallet if the item cannot be picked up. + +Sharding +-------- + +If your system needs to scale beyond a single MongoDB node, you will want to +use a :term:`shard cluster`, which takes advantage of MongoDB's +:term:`sharding` functionality. + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. + +Sharding in this use case is fairly +straightforward, since all our items are always retrieved by ``_id``. To shard +the ``character`` and ``location`` collections, the commands would be the +following: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'character', { + ... 'key': { '_id': 1 } }) + { "collectionsharded" : "character", "ok" : 1 } + >>> db.command('shardcollection', 'location', { + ... 'key': { '_id': 1 } }) + { "collectionsharded" : "location", "ok" : 1 } + diff --git a/source/applications/use-cases/index.txt b/source/applications/use-cases/index.txt index 6144bf7e3dd..53c4f96c0b6 100644 --- a/source/applications/use-cases/index.txt +++ b/source/applications/use-cases/index.txt @@ -33,3 +33,28 @@ Content Management Systems cms-metadata-and-asset-management cms-storing-comments + +Online Gaming +------------- + +.. toctree:: + :maxdepth: 2 + + gaming-user-state + +Online Advertising +------------------ + +.. toctree:: + :maxdepth: 2 + + ad-serving-ads + ad-campaign-management + +Social Networking +----------------- + +.. toctree:: + :maxdepth: 2 + + social-user-profile diff --git a/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt b/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt index d6aacb77cfd..3457a1f04b3 100644 --- a/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt +++ b/source/applications/use-cases/real-time-analytics-hierarchical-aggregation.txt @@ -470,24 +470,22 @@ timestamp) on the events collection. Consider the following: .. code-block:: pycon >>> db.command('shardcollection','events', { - ... key : { 'userid': 1, 'ts' : 1} } ) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key' : { 'userid': 1, 'ts' : 1} } ) { "collectionsharded": "events", "ok" : 1 } -To shard the aggregated collections you must use the ``_id`` field, -which is the default, so you can issue the following group of shard -operations in the Python/PyMongo shell: +To shard the aggregated collections you must use the ``_id`` field, so you can +issue the following group of shard operations in the Python/PyMongo shell: .. code-block:: python - db.command('shardcollection', 'stats.daily') - db.command('shardcollection', 'stats.weekly') - db.command('shardcollection', 'stats.monthly') - db.command('shardcollection', 'stats.yearly') + db.command('shardcollection', 'stats.daily', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.weekly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.monthly', { + 'key': { '_id': 1 } }) + db.command('shardcollection', 'stats.yearly', { + 'key': { '_id': 1 } }) You should also update the ``h_aggregate`` map-reduce wrapper to support sharded output Add ``'sharded':True`` to the ``out`` diff --git a/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt b/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt index da0a3e0e6eb..00b6d01c82b 100644 --- a/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt +++ b/source/applications/use-cases/real-time-analytics-pre-aggregated-reports.txt @@ -621,12 +621,7 @@ collection in the Python/PyMongo console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.daily', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.daily", "ok" : 1 } Enable sharding for the monthly statistics collection with the @@ -636,12 +631,7 @@ console: .. code-block:: pycon >>> db.command('shardcollection', 'stats.monthly', { - ... key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) - -Upon success, you will see the following response: - -.. code-block:: javascript - + ... 'key':{'metadata.site':1,'metadata.page':1,'metadata.date':1}}) { "collectionsharded" : "stats.monthly", "ok" : 1 } .. note:: diff --git a/source/applications/use-cases/real-time-analytics-storing-log-data.txt b/source/applications/use-cases/real-time-analytics-storing-log-data.txt index a11e998e6d6..61da2a0716f 100644 --- a/source/applications/use-cases/real-time-analytics-storing-log-data.txt +++ b/source/applications/use-cases/real-time-analytics-storing-log-data.txt @@ -7,7 +7,7 @@ Real Time Analytics: Storing Log Data Overview -------- -This document outlines the basic patterns and principals for using +This document outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data. diff --git a/source/applications/use-cases/social-user-profile.txt b/source/applications/use-cases/social-user-profile.txt new file mode 100644 index 00000000000..b94ecca4a3a --- /dev/null +++ b/source/applications/use-cases/social-user-profile.txt @@ -0,0 +1,589 @@ +.. -*- rst -*- + +================================== +Social Networking: Storing Updates +================================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principless for using +MongoDB as a persistent storage engine for a social network. In particular, this +document focuses on the task of storing and displaying user updates. + +Problem +~~~~~~~ + +You want to create an social network that will store profile information about +each user as well as allow the user to create various types of posts and updates +which will then be seen on their "friends'" walls. + +Solution +~~~~~~~~ + +The solution described below assumes a *directed* social graph where a user can +choose whether or not to follow another user. Additionally, the user can +designate "circles" of users to follow, in order to facilitate fine-grained +control of privacy. The solution presented below is designed in such a +way as to minimize the number of documents that must be loaded in order to +display any given page, even at the expense of complicating updates. + +The particulars of what type of data you want to host on your social network +obviously depends on the type of social network you are designing, and is largely +beyond the scope of this use case. In particular, the main variables that you +will have to consider in adapting this use case to your particular situation are: + +What data should be in a user profile? + This may include gender, age, interests, relationship status, etc. for a + "casual" social network, or may include resume-type data for a more + "business-oriented" social network. +What type of updates are allowed? + Again, depending on what flavor of social network you are designing, you may + wish to allow posts such as status updates, photos, links, checkins, and + polls, or you may wish to restrict your users to links and status updates. + +Schema Design +~~~~~~~~~~~~~ + +In the solution presented here, you will use two main "independent" collections +and three "dependent" collections to store user profile data and posts. + +Independent Collections +``````````````````````` + +The first +collection, ``social.user``, stores the social graph information for a given user +along with the user's profile data: + +.. code-block:: javascript + + { + _id: 'T4Y...AC', // base64-encoded ObjectId + name: 'Rick', + profile: { ... age, location, interests, etc. ... }, + followers: { + "T4Y...AD": { name: 'Jared', circles: [ 'python', 'authors'] }, + "T4Y...AF": { name: 'Bernie', circles: [ 'python' ] }, + "T4Y...AI": { name: 'Meghan', circles: [ 'python', 'speakers' ] }, + ... + ], + circles: { + "10gen": { + "T4Y...AD": { name: 'Jared' }, + "T4Y...AE": { name: 'Max' }, + "T4Y...AF": { name: 'Bernie' }, + "T4Y...AH": { name: 'Paul' }, + ... }, + ...} + }, + blocked: ['gh1...0d'] + } + +There are a few things to note about this schema: + +- Rather than using a "raw" ``ObjectId`` for your ``_id`` field, you'll use a + base64-encoded version. This allows you to use ``_id`` values as keys in + subdocuments, which both reduces the memory footprint of these subdocuments as + well as speeding up some operations. +- The social graph is stored bidirectionally in the ``followers`` and ``circles`` + collections. While this is technically redundant, having the bidirectional + connections is userful both for displaying the user's followers on the profile + page, as well as propagating posts to other users, as shown below. +- In addition to the normal "positive" social graph, the schema above also stores + a block list which contains an array of user ids for posters whose posts never + appear on the user's wall or news feed. +- The particular profile data stored for the user is isolated into the + ``profile`` subdocument, allowing you to evolve the schema as necessary without + +Of course, to make the network interesting, it's necessary to add various types of +posts. These are stored in the ``social.post`` collection: + +.. code-block:: javascript + + { + _id: ObjectId(...), + by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], + type: 'status', + ts: ISODateTime(...), + detail: { + text: 'Loving MongoDB' }, + comments: [ + { by: { id:"T4Y...AG", name: 'Dwight' }, + ts: ISODateTime(...), + text: 'Right on!' }, + ... all comments listed ... ] + } + +Here, the post stores minimal author information (``by``), the post ``type``, a +timestamp ``ts``, post details ``detail`` (which vary by post type), and a +``comments`` array. In this case, the schema embeds all comments on a post as a +time-sorted flat array. For a more in-depth exploration of the other approaches +to storing comments, please see the document +:doc:`CMS: Storing Comments `. + +A couple of points are worthy of further discussion: + +- Author information is truncated; just enough is stored in each ``by`` property + to display the author name and a link to the author profile. If your user + wants more detail on a particular author, you can fetch this information as + they request it. Storing minimal information like this helps keep the document + small (and therefore fast.) +- The visibility of the post is controlled via the ``circles`` property; any user + that is part of one of the listed circles can view the post. The special values + ``"\*public*"`` and ``"\*circles*"`` allow the user to share a post with the + whole world or with any users in any of the posting user's circles, respectively. +- Different types of posts may contain different types of data in the ``detail`` + field. Isolating this polymorphic information into a subdocument is a good + practice, helping you to clearly see which parts of the document are common to + all posts and which can vary. In this case, you would store different data for + a photo post versus a status update, while still keeping the metadata (``_id``, + ``by``, ``circles``, ``type``, ``ts``, and ``comments``) the same. + +Dependent Collections +````````````````````` + +In addition to the independent collections above, for optimal performance you'll +need to create a few dependent collections that will be used to cache +information for display. The first of these collections is the ``social.wall`` +collection, and is intended to display a "wall" containing posts created by or +directed to a particular user. The format of the ``social.wall`` collection +follows. + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + month: '201204', + posts: [ + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*public*' ], + type: 'status', + detail: { text: 'Loving MongoDB' }, + comments_shown: 3, + comments: [ + { by: { id: "T4Y...AG", name: 'Dwight', + ts: ISODateTime(...), + text: 'Right on!' }, + ... only last 3 comments listed ... + ] + }, + { id: ObjectId(...),s + ts: ISODateTime(...), + by: { id: "T4Y...AE", name: 'Max' }, + circles: [ '*circles*' ], + type: 'checkin', + detail: { + text: 'Great office!', + geo: [ 40.724348,-73.997308 ], + name: '10gen Office', + photo: 'http://....' }, + comments_shown: 1, + comments: [ + { by: { id: "T4Y...AD", name: 'Jared' }, + ts: ISODateTime(...), + text: 'Wrong coast!' }, + ... only last 1 comment listed ... + ] + }, + { id: ObjectId(...), + ts: ISODateTime(...), + by: { id: "T4Y...g9", name: 'Rick' }, + circles: [ '10gen' ], + type: 'status', + detail: { + text: 'So when do you crush Oracle?' }, + comments_shown: 2, + comments: [ + { by: { id: "T4Y...AE", name: 'Max' }, + ts: ISODateTime(...), + text: 'Soon... ;-)' }, + ... only last 2 comments listed ... + ] + }, + ... + ] + } + +There are a few things to note about this schema: + +- Each post is listed with an abbreviated number of comments (3 might be + typical.) This is to keep the size of the document reasonable. If you need to + display more comments on a post, you would then query the ``social.post`` + collection for full details. +- There are actually multiple ``social.wall`` documents for each ``social.user`` + document, one wall document per month. This allows the system to keep a "page" of + recent posts in the initial page view, fetching older months if requested. +- Once again, the ``by`` properties store only the minimal author information for + display, helping to keep this document small. +- The number of comments on each post is stored to allow later updates to find + posts with more than a certain number of comments since the ``$size`` query + operator does not allow inequality comparisons. + +The other dependent collection you'll use is ``social.news``, posts from people +the user follows. This schema includes much of the same information as the +``social.wall`` information, so the document below has been abbreviated for +clarity: + +.. code-block:: javascript + + { + _id: ObjectId(...), + user_id: "T4Y...AE", + month: '201204', + posts: [ ... ] + } + +Operations +---------- + +Since the schemas above optimize for read performance at the possible expense +of write performance, you should ideally provide a queueing system for +processing updates which may take longer than your desired web request latency. + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Viewing a News Feed or Wall Posts +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The most common operation on a social network probably the display of a +particular user's news feed, followed by a user's wall posts. Since the +``social.news`` and ``social.wall`` collections are optimized for these +operations, the query is fairly straightforward. Since these two collections +share a schema, viewing the posts for a news feed or a wall are actually quite +similar operations, and can be supported by the same code: + +.. code-block:: python + + def get_posts(collection, user_id, month=None): + spec = { 'user_id': viewed_user_id } + if month is not None: + spec['month'] = {'$lte': month} + cur = collection.find(spec) + cur = cur.sort('month', -1) + for page in cur: + for post in reversed(page['posts']): + yield page['month'], post + +The function ``get_posts`` above will retrieve all the posts on a particular user's +wall or news feed in reverse-chronological order. Some special handling is +required to efficieintly achieve the reverse-chronological ordering: + +- The ``posts`` within a month are actually stored in chronological order, so the + order of these posts must be reversed before displaying. +- As a user pages through her wall, it's preferable to avoid fetching the first + few months from the server each time. To achieve this, the code above specifies + the first month to fetch in the ``month`` argument, passing this in as an + ``$lte`` expression in the query. +- Rather than only yielding the post itself, the post's month is also yielded from + the generator. This provides the ``month`` argument to be used in any + subsequent calls to ``get_posts``. + +There is one other issue that needs to be considered in selecting posts for +display: privacy settings. In order to handle privacy issues effectively, you'll +need use some filter functions on the posts generated above by ``get_posts``. The +first of these filters is used to determine whether to show a post when the user +is viewing his or her own wall: + +.. code-block:: python + + def visible_on_own_wall(user, post): + '''if poster is followed by user, post is visible''' + for circle, users in user['circles'].items(): + if post['by']['id'] in users: return True + return False + +In addition to the user's wall, your social network might provide an "incoming" +page that contains all posts directed towards a user regardless of whether that +poster is followed by the user. In this case, you would use a block list +to filter posts: + +.. code-block:: python + + def visible_on_own_incoming(user, post): + '''if poster is not blocked by user, post is visible''' + return post['by']['id'] not in user['blocked'] + +When viewing a news feed or another user's wall, the permission check is a bit +different based on the post's ``circles`` property: + +.. code-block:: python + + def visible_post(user, post): + if post['circles'] == ['*public*']: + # public posts always visible + return True + circles_user_is_in = set( + user['followers'].get(post['by']['id'] [])) + if not circles_user_is_in: + # user is not circled by poster; post is invisible + return False + if post['circles'] == ['*circles*']: + # post is public to all followed users; post is visible + return True + for circle in post['circles']: + if circle in circles_user_is_in: + # User is in a circle receiving this post + return True + return False + +Index Support +````````````` + +In order to quickly retrieve the pages in the desired order, you'll need an index +on (``user_id``, ``month``) in both the ``social.news`` and ``social.wall`` +collections. + +.. code-block:: pycon + + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index([ + ... ('user_id', 1), + ... ('month', -1)]) + +Commenting on a Post +~~~~~~~~~~~~~~~~~~~~ + +Other than viewing walls and news feeds, creating new posts is the next most +common action taken on social networks. To create a comment by ``user`` on a +given ``post`` containing the given ``text``, you'll need to execute code similar +to the following: + +.. code-block:: python + + from datetime import datetime + + def comment(user, post_id, text): + ts = datetime.utcnow() + month = ts.strfime('%Y%m') + comment = { + 'by': { 'id': user['id'], 'name': user['name'] } + 'ts': ts, + 'text': text } + # Update the social.posts collection + db.social.post.update( + { '_id': post_id }, + { '$push': { 'comments': comment } } ) + # Update social.wall and social.news collections + db.social.wall.update( + { 'posts.id': post_id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + db.social.news.update( + { 'posts.id': _id }, + { '$push': { 'comments': comment }, + '$inc': { 'comments_shown': 1 } }, + upsert=True, + multi=True) + +.. note:: + + One thing to note in this function is the presence of a couple of ``multi=True`` + update statements. Since these can potentially take quite a long time, this + function is a good candidate for processing 'out of band' with the regular + request-response flow of your application. + +The code above can actually result in an unbounded number of comments being +inserted into the ``social.wall`` and ``social.news`` collections. To compensate +for this, you should periodically run the following update statement to truncate +the number of displayed comments and keep the size of the news and wall documents +manageable.: + +.. code-block:: python + + COMMENTS_SHOWN = 3 + + def truncate_extra_comments(): + db.social.news.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) + db.social.wall.update( + { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } }, + { '$pop': { 'posts.$.comments': -1 }, + '$inc': { 'posts.$.comments_shown': -1 } }, + multi=True) + +Index Support +````````````` +In order to execute the updates to the ``social.news`` and ``social.wall`` +collections show above efficiently, you'll need to be able to quickly locate both +of the following types of documents: + +- Documents containing a given post +- Documents containing posts displaying too many comments + +To quickly execute these updates, then, you'll need to create the following +indexes: + +.. code-block:: pycon + + >>> for collection in (db.social.news, db.social.wall): + ... collection.ensure_index('posts.id') + ... collection.ensure_index('posts.comments_shown') + +Creating a New Post +~~~~~~~~~~~~~~~~~~~ + +Creating a new post fills out the content-creation activities on a social +network: + +.. code-block:: python + + from datetime import datetime + + def post(user, dest_user, type, detail, circles): + ts = datetime.utcnow() + month = ts.strfime('%Y%m') + post = { + 'ts': ts, + 'by': { id: user['id'], name: user['name'] }, + 'circles': circles, + 'type': type, + 'detail': detail, + 'comments': [] } + # Update global post collection + db.social.post.insert(post) + # Copy to dest user's wall + if user['id'] not in dest_user['blocked']: + append_post(db.social.wall, [dest_user['id']], month, post) + # Copy to followers' news feeds + if circles == ['*public*']: + dest_userids = set(user['followers'].keys()) + else: + dest_userids = set() + if circles == [ '*circles*' ]: + circles = user['circles'].keys() + for circle in circles: + dest_userids.update(user['circles'][circle]) + append_post(db.social.news, dest_userids, month, post) + +The basic sequence of operations in the code above is the following: + +#. The post first saved into the "system of record," the ``social.post`` + collection. +#. The recipient's wall is updatd with the post. +#. The news feeds of everyone who is 'circled' in the post is updated with the + post. + +Updating a particular wall or group of news feeds is then accomplished using the +``append_post`` function: + +.. code-block:: python + + def append_post(collection, dest_userids, month, post): + collection.update( + { 'user_id': { '$in': sorted(dest_userids) }, + 'month': month }, + { '$push': { 'posts': post } }, + multi=True) + +Index Support +````````````` + +In order to quickly update the ``social.wall`` and ``social.news`` collections, +you'll once again need an index on both ``user_id`` and ``month``. This time, +however, the optimal order on the indexes is (``month``, ``user_id``). This is +due to the fact that updates to these collections will always be for the current +month; having month appear first in the index makes the index *right-aligned*, +requiring significantly less memory to store the active part of the index. + +*However*, in this case, since you have already defined an index on (``user_id``, + ``month``), which *must* be in that order so that you can do the sort on + ``month``, adding a second index is unnecessary, and would end up actually using + more RAM to maintain two indexes. So even though this particular operation would + benefit from having an index on (``month``, ``user_id``), it's best to leave out + any additional indexes here. + +Maintaining the Social Graph +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In your social network, maintaining the social graph is an infrequent but +essential operation. The code to add a user ``other`` to the current user +``self``\'s circles, you'll need to run the following function: + +.. code-block:: python + + def circle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$set': { circles_path: { 'name': other['name' ]} } }) + follower_circles = 'followers.%s.circles' % self['_id'] + follower_name = 'followers.%s.name' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$push': { follower_circles: circle }, + '$set': { follower_name: self['name'] } }) + +Note that in this solution, previous posts of the ``other`` user are not added to +the ``self`` user's news feed or wall. To actually include these past posts would +be an expensive and complex operation, and goes beyond the scope of this use case. + +Of course, you'll also need to support *removing* users from circles: + +.. code-block:: python + + def uncircle_user(self, other, circle): + circles_path = 'circles.%s.%s' % (circle, other['_id']) + db.social.user.update( + { '_id': self['_id'] }, + { '$unset': { circles_path: 1 } }) + follower_circles = 'followers.%s.circles' % self['_id'] + db.social.user.update( + { '_id': other['_id'] }, + { '$pull': { follower_circles: circle } }) + # Special case -- 'other' is completely uncircled + db.social.user.update( + { '_id': other['_id'], follower_circles: {'$size': 0 } }, + { '$unset': { 'followers.' + self['_id' } }}) + +Index Support +````````````` + +In both the circling and uncircling cases, the ``_id`` is included in the update +queries, so no additional indexes are required. + +Sharding +-------- + +In order to scale beyond the capacity of a single replica set, you will need to +shard each of the collections mentioned above. Since the ``social.user``, +``social.wall``, and ``social.news`` collections contain documents which are +specific to a given user, the user's ``_id`` field is an appropriate shard key: + +.. code-block:: pycon + + >>> db.command('shardcollection', 'social.user', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.user", "ok": 1 } + >>> db.command('shardcollection', 'social.wall', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.wall", "ok": 1 } + >>> db.command('shardcollection', 'social.news', { + ... 'key': {'user_id': 1 } } ) + { "collectionsharded": "social.news", "ok": 1 } + +It turns out that using the posting user's ``_id`` is actually *not* the best +choice for a shard key for ``social.post``. This is due to the fact that queries +and updates to this table are done using the ``_id`` field, and sharding on +``by.id``, while tempting, would require these updates to be *broadcast* to all +shards. To shard the ``social.post`` collection on ``_id``, then, you'll need to +execute the following command: + + >>> db.command('shardcollection', 'social.post', { + ... 'key': {'_id': 1 } } ) + { "collectionsharded": "social.post", "ok": 1 } + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page. diff --git a/source/applications/use-cases/use-case-template.txt b/source/applications/use-cases/use-case-template.txt new file mode 100644 index 00000000000..4ab5ba7cd5f --- /dev/null +++ b/source/applications/use-cases/use-case-template.txt @@ -0,0 +1,60 @@ +.. -*- rst -*- + +:orphan: + +==================== +TODO: Section: Title +==================== + +.. default-domain:: mongodb + +Overview +-------- + +This document outlines the basic patterns and principles for using +MongoDB as a persistent storage engine for TODO: what are we building? + +Problem +~~~~~~~ + +TODO: describe problem + +Solution +~~~~~~~~ + +TODO: describe assumptions, overview of solution + +Schema Design +~~~~~~~~~~~~~ + +TODO: document collections, doc schemas + +Operations +---------- + +TODO: summary of the operations section + +The examples that follow use the Python programming language and the +:api:`PyMongo ` :term:`driver` for MongoDB, but you +can implement this system using any language you choose. + +Operation 1 +~~~~~~~~~~~ + +TODO: describe what the operation is (optional) + +Query +````` + +TODO: describe query + +Index Support +````````````` + +TODO: describe indexes to optimize this query + +Sharding +-------- + +.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki + page.