diff --git a/source/tutorial/usecase/cms-metadata-and-asset-management.txt b/source/tutorial/usecase/cms-metadata-and-asset-management.txt
new file mode 100644
index 00000000000..e3a4886c3e0
--- /dev/null
+++ b/source/tutorial/usecase/cms-metadata-and-asset-management.txt
@@ -0,0 +1,453 @@
+==================================
+CMS: Metadata and Asset Management
+==================================
+
+Problem
+=======
+
+You are designing a content management system (CMS) and you want to use
+MongoDB to store the content of your sites.
+
+Solution Overview
+=================
+
+The approach in this solution is inspired by the design of Drupal, an
+open source CMS written in PHP on relational databases that is available
+at `http://www.drupal.org <http://www.drupal.org>`_. In this case, you
+will take advantage of MongoDB's dynamically typed collections to
+*polymorphically* store all your content nodes in the same collection.
+Navigational information will be stored in its own collection since
+it has relatively little in common with the content nodes, and is not covered in
+this use case.
+
+The main node types which are covered here are:
+
+Basic page
+    Basic pages are useful for displaying
+    infrequently-changing text such as an 'about' page. With a basic
+    page, the salient information is the title and the
+    content.
+Blog entry
+    Blog entries record a "stream" of posts from users
+    on the CMS and store title, author, content, and date as relevant
+    information.
+Photo
+    Photos participate in photo galleries, and store title,
+    description, author, and date along with the actual photo binary
+    data.
+
+Schema Design
+=============
+
+The node collection contains documents of various formats, but they
+will all share a similar structure, with each document including an
+``_id``, ``type``, ``section``, ``slug``, ``title``, ``created`` date,
+``author``, and ``tags``. The
+``section`` property is used to identify groupings of items (grouped to a
+particular blog or photo gallery, for instance). The ``slug`` property is
+a url-friendly representation of the node that is unique within its
+section, and is used for mapping URLs to nodes. Each document also
+contains a ``detail`` field which will vary per document type:
+
+.. code-block:: javascript
+
+    {
+        _id: ObjectId(…),
+        nonce: ObjectId(…),
+        metadata: {
+            type: 'basic-page'
+            section: 'my-photos',
+            slug: 'about',
+            title: 'About Us',
+            created: ISODate(...),
+            author: { _id: ObjectId(…), name: 'Rick' },
+            tags: [ ... ],
+            detail: { text: '# About Us\n…' }
+        }
+    }
+
+For the basic page above, the detail field might simply contain the text
+of the page. In the case of a blog entry, the document might resemble
+the following instead:
+
+.. code-block:: javascript
+
+    {
+        …
+        metadata: {
+            …
+            type: 'blog-entry',
+            section: 'my-blog',
+            slug: '2012-03-noticed-the-news',
+            …
+            detail: {
+                publish_on: ISODate(…),
+                text: 'I noticed the news from Washington today…'
+            }
+        }
+    }
+
+Photos present something of a special case. Since you'll need to store
+potentially very large photos, it's nice to be able to separate the binary
+storage of photo data from the metadata storage. GridFS provides just such a
+mechanism, splitting a "filesystem" of potentially very large "files" into
+two collections, the ``files`` collection and the ``chunks`` collection. In
+this case, the two collections will be called ``cms.assets.files`` and
+``cms.assets.chunks``. Documents in the ``cms.assets.files``
+collection will be used to store the normal GridFS metadata as well as CMS node
+metadata:
+
+.. code-block:: javascript
+
+    {
+        _id: ObjectId(…),
+        length: 123...,
+        chunkSize: 262144,
+        uploadDate: ISODate(…),
+        contentType: 'image/jpeg',
+        md5: 'ba49a...',
+        metadata: {
+            nonce: ObjectId(…),
+            slug: '2012-03-invisible-bicycle',
+            type: 'photo',
+            section: 'my-album',
+            title: 'Kitteh',
+            created: ISODate(…),
+            author: { _id: ObjectId(…), name: 'Jared' },
+            tags: [ … ],
+            detail: {
+                filename: 'kitteh_invisible_bike.jpg',
+                resolution: [ 1600, 1600 ], … }
+        }
+    }
+
+NOte that the "normal" node schema is embedded here in the photo schema, allowing
+the use of the same code to manipulate nodes of all types.
+
+Operations
+==========
+
+Here, some common queries and updates that you might need for your CMS are
+described, paying particular attention to any "tweaks" necessary for the various
+node types. The examples use the Python
+programming language and the ``pymongo`` MongoDB driver, but implementations
+would be similar in other languages as well.
+
+Create and Edit Content Nodes
+-----------------------------
+
+The content producers using your CMS will be creating and editing content
+most of the time. Most content-creation activities are relatively
+straightforward:
+
+.. code-block:: python
+
+    db.cms.nodes.insert({
+        'nonce': ObjectId(),
+        'metadata': {
+            'section': 'myblog',
+            'slug': '2012-03-noticed-the-news',
+            'type': 'blog-entry',
+            'title': 'Noticed in the News',
+            'created': datetime.utcnow(),
+            'author': { 'id': user_id, 'name': 'Rick' },
+            'tags': [ 'news', 'musings' ],
+            'detail': {
+                'publish_on': datetime.utcnow(),
+                'text': 'I noticed the news from Washington today…' }
+            }
+         })
+
+Once the node is in the database, there is a potential problem with
+multiple editors. In order to support this, the schema uses the special ``nonce``
+value to detect when another editor may have modified the document and
+allow the application to resolve any conflicts:
+
+.. code-block:: python
+
+    def update_text(section, slug, nonce, text):
+        result = db.cms.nodes.update(
+            { 'metadata.section': section,
+              'metadata.slug': slug,
+              'nonce': nonce },
+            { '$set':{'metadata.detail.text': text, 'nonce': ObjectId() } },
+            safe=True)
+        if not result['updatedExisting']:
+            raise ConflictError()
+
+You might also want to perform metadata edits to the item such as adding
+tags:
+
+.. code-block:: python
+
+    db.cms.nodes.update(
+        { 'metadata.section': section, 'metadata.slug': slug },
+        { '$addToSet': { 'tags': { '$each': [ 'interesting', 'funny' ] } } })
+
+In this case, you don't actually need to supply the nonce (nor update it)
+since you're using the atomic ``$addToSet`` modifier in MongoDB.
+
+Index Support
+~~~~~~~~~~~~~
+
+Updates in this case are based on equality queries containing the
+(``section``, ``slug``, and ``nonce``) values. To support these queries, you
+*might* use the following index:
+
+.. code-block:: python
+
+    >>> db.cms.nodes.ensure_index([
+    ...    ('metadata.section', 1), ('metadata.slug', 1), ('nonce', 1) ])
+
+Also note, however, that you'd like to ensure that two editors don't
+create two documents with the same section and slug. To support this, you need a
+second index with a unique constraint:
+
+.. code-block:: python
+
+    >>> db.cms.nodes.ensure_index([
+    ...    ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
+
+In fact, since the expectation is that most of the time (``section``, ``slug``,
+``nonce``) is going to be unique, you don't actually get much benefit from the
+first index and can use only the second one to satisfy the update queries as
+well.
+
+Upload a Photo
+--------------
+
+Uploading photos shares some things in common with node
+update, but it also has some extra nuances:
+
+.. code-block:: python
+
+    def upload_new_photo(
+        input_file, section, slug, title, author, tags, details):
+        fs = GridFS(db, 'cms.assets')
+        with fs.new_file(
+            content_type='image/jpeg',
+            metadata=dict(
+                type='photo',
+                locked=datetime.utcnow(),
+                section=section,
+                slug=slug,
+                title=title,
+                created=datetime.utcnow(),
+                author=author,
+                tags=tags,
+                detail=detail)) as upload_file:
+            while True:
+                chunk = input_file.read(upload_file.chunk_size)
+                if not chunk: break
+                upload_file.write(chunk)
+        # unlock the file
+        db.assets.files.update(
+            {'_id': upload_file._id},
+            {'$set': { 'locked': None } } )
+
+Here, since uploading the photo is a non-atomic operation, you need to
+"lock" the file during upload by writing the current datetime into the
+record. This lets the application detect when a file upload may be stalled, which
+is helpful when working with multiple editors. This solution assumes that, for
+photo upload, the last update wins:
+
+.. code-block:: python
+
+    def update_photo_content(input_file, section, slug):
+        fs = GridFS(db, 'cms.assets')
+
+        # Delete the old version if it's unlocked or was locked more than 5
+        #    minutes ago
+        file_obj = db.cms.assets.find_one(
+            { 'metadata.section': section,
+              'metadata.slug': slug,
+              'metadata.locked': None })
+        if file_obj is None:
+            threshold = datetime.utcnow() - timedelta(seconds=300)
+            file_obj = db.cms.assets.find_one(
+                { 'metadata.section': section,
+                  'metadata.slug': slug,
+                  'metadata.locked': { '$lt': threshold } })
+        if file_obj is None: raise FileDoesNotExist()
+        fs.delete(file_obj['_id'])
+
+        # update content, keep metadata unchanged
+        file_obj['locked'] = datetime.utcnow()
+        with fs.new_file(**file_obj):
+            while True:
+                chunk = input_file.read(upload_file.chunk_size)
+                if not chunk: break
+                upload_file.write(chunk)
+        # unlock the file
+        db.assets.files.update(
+            {'_id': upload_file._id},
+            {'$set': { 'locked': None } } )
+
+You can, of course, perform metadata edits to the item such as adding
+tags without the extra complexity:
+
+.. code-block:: python
+
+    db.cms.assets.files.update(
+        { 'metadata.section': section, 'metadata.slug': slug },
+        { '$addToSet': {
+            'metadata.tags': { '$each': [ 'interesting', 'funny' ] } } })
+
+Index Support
+~~~~~~~~~~~~~
+
+Updates here are also based on equality queries containing the
+(``section``, ``slug``) values, so you can use the same types of indexes as were
+used in the "regular" node case. Note in particular that you need a
+unique constraint on (``section``, ``slug``) to ensure that one of the calls to
+``GridFS.new_file()`` will fail if multiple editors try to create or update
+the file simultaneously.
+
+.. code-block:: python
+
+    >>> db.cms.assets.files.ensure_index([
+    ...    ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
+
+Locate and Render a Node
+------------------------
+
+You need to be able to locate a node based on its section and slug, which
+have been extracted from the page definition and URL by some
+other technology.
+
+.. code-block:: python
+
+    node = db.nodes.find_one(
+        {'metadata.section': section, 'metadata.slug': slug })
+
+Index Support
+~~~~~~~~~~~~~
+
+The same indexes defined above on (``section``, ``slug``) would
+efficiently render this node.
+
+Locate and Render a Photo
+-------------------------
+
+You want to locate an image based on its section and slug,
+which have been extracted from the page definition and URL
+just as with other nodes.
+
+.. code-block:: python
+
+    fs = GridFS(db, 'cms.assets')
+    with fs.get_version(
+        **{'metadata.section': section, 'metadata.slug': slug }) as img_fp:
+       # do something with the image file
+
+Index Support
+~~~~~~~~~~~~~
+
+The same indexes defined above on (``section``, ``slug``) would also
+efficiently render this image.
+
+Search for Nodes by Tag
+-----------------------
+
+You'd like to retrieve a list of nodes based on their tags:
+
+.. code-block:: python
+
+    nodes = db.nodes.find({'metadata.tags': tag })
+
+Index Support
+~~~~~~~~~~~~~
+
+To support searching efficiently, you should define indexes on any fields
+you intend on using in your query:
+
+.. code-block:: python
+
+    >>> db.cms.nodes.ensure_index('tags')
+
+Search for Images by Tag
+------------------------
+
+Here, you'd like to retrieve a list of images based on their tags:
+
+.. code-block:: python
+
+    image_file_objects = db.cms.assets.files.find({'metadata.tags': tag })
+    fs = GridFS(db, 'cms.assets')
+    for image_file_object in db.cms.assets.files.find(
+        {'metadata.tags': tag }):
+        image_file = fs.get(image_file_object['_id'])
+        # do something with the image file
+
+Index Support
+~~~~~~~~~~~~~
+
+As above, in order to support searching efficiently, you should define
+indexes on any fields you expect to use in the query:
+
+.. code-block:: python
+
+    >>> db.cms.assets.files.ensure_index('tags')
+
+Generate a Feed of Recently Published Blog Articles
+---------------------------------------------------
+
+Here, you need to generate an .rss or .atom feed for your recently
+published blog articles, sorted by date descending:
+
+.. code-block:: python
+
+    articles = db.nodes.find({
+        'metadata.section': 'my-blog'
+        'metadata.published': { '$lt': datetime.utcnow() } })
+    articles = articles.sort({'metadata.published': -1})
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to support this operation, you'll need to create an index on (``section``,
+``published``) so the items are 'in order' for the query. Note that in cases
+where you're sorting or using range queries, as here, the field on which
+you're sorting or performing a range query must be the final field in the
+index:
+
+.. code-block:: python
+
+    >>> db.cms.nodes.ensure_index(
+    ...     [ ('metadata.section', 1), ('metadata.published', -1) ])
+
+Sharding
+========
+
+In a CMS system, read performance is generally much more important
+than write performance. As such, you'll want to optimize the sharding setup
+for read performance. In order to achieve the best read performance, you
+need to ensure that queries are *routeable* by the mongos process. A
+second consideration when sharding is that unique indexes do not span
+shards. As such, the shard key must include the unique indexes in order to get
+the same semantics as described above. Given
+these constraints, sharding the nodes and assets on (``section``, ``slug``)
+is a reasonable approach:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'cms.nodes', {
+    ...     key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
+    { "collectionsharded" : "cms.nodes", "ok" : 1 }
+    >>> db.command('shardcollection', 'cms.assets.files', {
+    ...     key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
+    { "collectionsharded" : "cms.assets.files", "ok" : 1 }
+
+If you wish to shard the ``cms.assets.chunks`` collection, you need to shard
+on the ``_id`` field (none of the node metadata is available on the
+``cms.assets.chunks`` collection in GridFS:)
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'cms.assets.chunks'
+    { "collectionsharded" : "cms.assets.chunks", "ok" : 1 }
+
+This actually still maintains the query-routability constraint, since
+all reads from GridFS must first look up the document in ``cms.assets.files`` and
+then look up the chunks separately (though the GridFS API sometimes
+hides this detail.)
diff --git a/source/tutorial/usecase/cms-storing-comments.txt b/source/tutorial/usecase/cms-storing-comments.txt
new file mode 100644
index 00000000000..fd3551fa696
--- /dev/null
+++ b/source/tutorial/usecase/cms-storing-comments.txt
@@ -0,0 +1,600 @@
+=====================
+CMS: Storing Comments
+=====================
+
+Problem
+=======
+
+In your content management system (CMS), you would like to store
+user-generated comments on the various types of content you generate.
+
+Solution Overview
+=================
+
+Rather than describing the One True Way to implement comments in this
+solution, this use case explores different options and the trade-offs with
+each. The three major designs discussed here are:
+
+One document per comment
+   This provides the greatest degree of
+   flexibility, as it is relatively straightforward to display the
+   comments as either threaded or chronological. There are also no
+   restrictions on the number of comments that can participate in a
+   discussion.
+All comments embedded
+   In this design, all the comments are
+   embedded in their parent document, whether that be a blog article,
+   news story, or forum topic. This can be the highest performance
+   design, but is also the most restrictive, as the display format of
+   the comments is tied to the embedded structure. There are also
+   potential problems with extremely active discussions where the total
+   data (topic data + comments) exceeds the 16MB limit of MongoDB
+   documents.
+Hybrid design
+   Here, you store comments separately from their
+   parent topic, but aggregate comments together into a few
+   documents, each containing many comments.
+
+Another decision that needs to be considered in designing a commenting
+system is whether to support threaded commenting (explicit replies to a
+parent comment). This threaded comment support
+decision will also be discussed below.
+
+Schema design: One Document Per Comment
+=======================================
+
+A comment in the one document per comment format might have a structure
+similar to the following:
+
+.. code-block:: javascript
+
+    {
+        _id: ObjectId(...),
+        discussion_id: ObjectId(...),
+        slug: '34db',
+        posted: ISODateTime(...),
+        author: { id: ObjectId(...), name: 'Rick' },
+        text: 'This is so bogus ... '
+    }
+
+The format above is really only suitable for chronological display of
+commentary. It maintains a reference to the discussion in which this
+comment participates, a url-friendly ``slug`` to identify it, ``posted`` time
+and ``author``, and the comment's ``text``. If you want to support threading in
+this format, you need to maintain some notion of hierarchy in the comment
+model as well:
+
+.. code-block:: javascript
+
+    {
+        _id: ObjectId(...),
+        discussion_id: ObjectId(...),
+        parent_id: ObjectId(...),
+        slug: '34db/8bda',
+        full_slug: '34db:2012.02.08.12.21.08/8bda:2012.02.09.22.19.16',
+        posted: ISODateTime(...),
+        author: { id: ObjectId(...), name: 'Rick' },
+        text: 'This is so bogus ... '
+    }
+
+Here, the schema includes some extra information into the document that
+represents this document's position in the hierarchy. In addition to
+maintaining the ``parent_id`` for the comment, the slug format has been modified
+and a new field ``full_slug`` has been added. The slug is now a path
+consisting of the parent's slug plus the comment's unique slug portion.
+The ``full_slug`` is also included to facilitate sorting documents in a
+threaded discussion by posting date.
+
+Operations: One Comment Per Document
+====================================
+
+Here, some common operations that you might need for your CMS are
+described in the context of the single comment per document schema. All of the
+following examples use the Python programming language and the ``pymongo``
+MongoDB driver, but implementations would be similar in other languages as well.
+
+Post a New Comment
+------------------
+
+In order to post a new comment in a chronologically ordered (unthreaded)
+system, all you need to do is ``insert()``:
+
+.. code-block:: python
+
+    slug = generate_psuedorandom_slug()
+    db.comments.insert({
+        'discussion_id': discussion_id,
+        'slug': slug,
+        'posted': datetime.utcnow(),
+        'author': author_info,
+        'text': comment_text })
+
+In the case of a threaded discussion, there is a bit more work to do in
+order to generate a "pathed" ``slug`` and ``full_slug``:
+
+.. code-block:: python
+
+    posted = datetime.utcnow()
+
+    # generate the unique portions of the slug and full_slug
+    slug_part = generate_psuedorandom_slug()
+    full_slug_part = slug_part + ':' + posted.strftime(
+        '%Y.%m.%d.%H.%M.%S')
+
+    # load the parent comment (if any)
+    if parent_slug:
+        parent = db.comments.find_one(
+            {'discussion_id': discussion_id, 'slug': parent_slug })
+        slug = parent['slug'] + '/' + slug_part
+        full_slug = parent['full_slug'] + '/' + full_slug_part
+    else:
+        slug = slug_part
+        full_slug = full_slug_part
+
+    # actually insert the comment
+    db.comments.insert({
+        'discussion_id': discussion_id,
+        'slug': slug, 'full_slug': full_slug,
+        'posted': posted,
+        'author': author_info,
+        'text': comment_text })
+
+View the (Paginated) Comments for a Discussion
+----------------------------------------------
+
+To actually view the comments in the non-threaded design, you need merely
+to select all comments participating in a discussion, sorted by ``posted``:
+
+.. code-block:: python
+
+    cursor = db.comments.find({'discussion_id': discussion_id})
+    cursor = cursor.sort('posted')
+    cursor = cursor.skip(page_num * page_size)
+    cursor = cursor.limit(page_size)
+
+Since the ``full_slug`` embeds both hierarchical information via the path
+and chronological information, you can use a simple sort on the
+``full_slug`` property to retrieve a threaded view:
+
+.. code-block:: python
+
+    cursor = db.comments.find({'discussion_id': discussion_id})
+    cursor = cursor.sort('full_slug')
+    cursor = cursor.skip(page_num * page_size)
+    cursor = cursor.limit(page_size)
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to efficiently support the queries above, you should maintain
+two compound indexes, one on (``discussion_id``, ``posted``), and the other on
+(``discussion_id``, ``full_slug``):
+
+.. code-block:: python
+
+    >>> db.comments.ensure_index([
+    ...    ('discussion_id', 1), ('posted', 1)])
+    >>> db.comments.ensure_index([
+    ...    ('discussion_id', 1), ('full_slug', 1)])
+
+Note that you must ensure that the final element in a compound index is
+the field by which you are sorting to ensure efficient performance of
+these queries.
+
+Retrieve a Comment Via Slug ("Permalink")
+-----------------------------------------
+
+Suppose you wish to directly retrieve a comment (e.g. *not* requiring
+paging through all preceeding pages of commentary). In this case, you'd
+simply use the ``slug``:
+
+.. code-block:: python
+
+    comment = db.comments.find_one({
+        'discussion_id': discussion_id,
+        'slug': comment_slug})
+
+You can also retrieve a sub-discussion (a comment and all of its
+descendants recursively) by performing a prefix query on the ``full_slug``
+field:
+
+.. code-block:: python
+
+    subdiscussion = db.comments.find_one({
+        'discussion_id': discussion_id,
+        'full_slug': re.compile('^' + re.escape(parent_slug)) })
+    subdiscussion = subdiscussion.sort('full_slug')
+
+Index Support
+~~~~~~~~~~~~~
+
+Since you already have indexes on (``discussion_id``, ``full_slug``) necessary to
+support retrieval of subdiscussions, all you need to add here is an index on
+(``discussion_id``, ``slug``) to efficiently support retrieval of a comment by
+'permalink':
+
+.. code-block:: python
+
+    >>> db.comments.ensure_index([
+    ...    ('discussion_id', 1), ('slug', 1)])
+
+Schema Design: All Comments Embedded
+====================================
+
+In this design, you wish to embed an entire discussion within its topic
+document, be it a blog article, news story, or discussion thread. A
+topic document, then, might look something like the following:
+
+.. code-block:: python
+
+    {
+        _id: ObjectId(...),
+        ... lots of topic data ...
+        comments: [
+            { posted: ISODateTime(...),
+              author: { id: ObjectId(...), name: 'Rick' },
+              text: 'This is so bogus ... ' },
+           ... ]
+    }
+
+The format above is really only suitable for chronological display of
+commentary. The comments are embedded in chronological order, with their
+posting date, author, and text. Note that, since you're storing the
+comments in sorted order, there is no longer need to maintain a slug per
+comment. If you wanted to support threading in the embedded format, you'd need
+to embed comments within comments:
+
+.. code-block:: python
+
+    {
+        _id: ObjectId(...),
+        ... lots of topic data ...
+        replies: [
+            { posted: ISODateTime(...),
+              author: { id: ObjectId(...), name: 'Rick' },
+
+
+              text: 'This is so bogus ... ',
+              replies: [
+                  { author: { ... }, ... },
+           ... ]
+    }
+
+Here, there is a ``replies`` property added to each comment which can hold
+sub-comments and so on. One thing in particular to note about the
+embedded document formats is you give up some flexibility when embedding
+the comments, effectively "baking in" the decisions made about
+the proper display format. If you (or your users) someday wish to switch
+from chronological or vice-versa, this schema makes such a migration
+quite expensive.
+
+In popular discussions, you also might have an issue with document size.
+If you have a particularly avid discussion, for example, it might
+outgrow the 16MB limit that MongoDB places on document size. You can also
+run into scaling issues, particularly in the threaded design, as
+documents need to be frequently moved on disk as they outgrow the space
+allocated to them.
+
+Operations: All Comments Embedded
+=================================
+
+Here, some common operations that you might need for your CMS are
+described in the context of embedded comment schema. Once again, the examples are
+in Python. Note that, in all the cases below,
+there is no need for additional indexes since all the operations are
+intra-document, and the document itself (the "discussion") is retrieved
+by its ``_id`` field, which is automatically indexed by MongoDB anyway.
+
+Post a new comment
+------------------
+
+In order to post a new comment in a chronologically ordered (unthreaded)
+system, you need the following ``update()``:
+
+.. code-block:: python
+
+    db.discussion.update(
+        { 'discussion_id': discussion_id },
+        { '$push': { 'comments': {
+            'posted': datetime.utcnow(),
+            'author': author_info,
+            'text': comment_text } } } )
+
+Note that since you used the ``$push`` operator, all the comments will be
+inserted in their correct chronological order. In the case of a threaded
+discussion, there si a good bit more work to do. In order to reply to a
+comment, the code below assumes that it has access to the 'path' to the comment
+you're replying to as a list of positions:
+
+.. code-block:: python
+
+    if path != []:
+        str_path = '.'.join('replies.%d' % part for part in path)
+        str_path += '.replies'
+    else:
+        str_path = 'replies'
+    db.discussion.update(
+        { 'discussion_id': discussion_id },
+        { '$push': {
+            str_path: {
+                'posted': datetime.utcnow(),
+                'author': author_info,
+                'text': comment_text } } } )
+
+Here, you first construct a field name of the form
+``replies.0.replies.2...`` as ``str_path`` and then use that to ``$push`` the new
+comment into its parent comment's ``replies`` property.
+
+View the (Paginated) Comments for a Discussion
+-----------------------------------------------
+
+To actually view the comments in the non-threaded design, you need to use
+the ``$slice`` operator:
+
+.. code-block:: python
+
+    discussion = db.discussion.find_one(
+        {'discussion_id': discussion_id},
+        { ... some fields relevant to your page from the root discussion ...,
+          'comments': { '$slice': [ page_num * page_size, page_size ] }
+        })
+
+If you wish to view paginated comments for the threaded design, you need
+to retrieve the whole document and paginate in your application:
+
+.. code-block:: python
+
+    discussion = db.discussion.find_one({'discussion_id': discussion_id})
+
+    def iter_comments(obj):
+        for reply in obj['replies']:
+            yield reply
+            for subreply in iter_comments(reply):
+                yield subreply
+
+    paginated_comments = itertools.slice(
+        iter_comments(discussion),
+        page_size * page_num,
+        page_size * (page_num + 1))
+
+Retrieve a Comment Via Position or Path ("Permalink")
+-----------------------------------------------------
+
+Instead of using slugs as above, this example retrieves comments by their
+position in the comment list or tree. In the case of the chronological
+(non-threaded) design, you need simply to use the ``$slice`` operator to
+extract the correct comment:
+
+.. code-block:: python
+
+    discussion = db.discussion.find_one(
+        {'discussion_id': discussion_id},
+        {'comments': { '$slice': [ position, position ] } })
+    comment = discussion['comments'][0]
+
+In the case of the threaded design, you're faced with the task of
+finding the correct path through the tree in your application:
+
+.. code-block:: python
+
+    discussion = db.discussion.find_one({'discussion_id': discussion_id})
+    current = discussion
+    for part in path:
+        current = current.replies[part]
+    comment = current
+
+Note that, since the replies to comments are embedded in their parents,
+you've have actually retrieved the entire sub-discussion rooted in the
+comment you were looking for as well.
+
+Schema Design: Hybrid
+=====================
+
+Comments in the hybrid format are stored in 'buckets' of about 100
+comments each:
+
+.. code-block:: python
+
+    {
+        _id: ObjectId(...),
+        discussion_id: ObjectId(...),
+        page: 1,
+        count: 42,
+        comments: [ {
+            slug: '34db',
+            posted: ISODateTime(...),
+            author: { id: ObjectId(...), name: 'Rick' },
+            text: 'This is so bogus ... ' },
+        ... ]
+    }
+
+Here, you maintain a "page" of comment data, containing a bit of metadata
+about the page (in particular, the page number and the comment count),
+as well as the comment bodies themselves. Using a hybrid format actually
+makes storing comments hierarchically quite complex, that approach is not covered
+in this document.
+
+Note that in this design, 100 comments is a 'soft' limit to the number
+of comments per page, chosen mainly for performance reasons and to
+ensure that the comment page never grows beyond the 16MB limit MongoDB
+imposes on document size. There may be occasions when the number of
+comments is slightly larger than 100, but this does not affect the
+correctness of the design.
+
+Operations: Hybrid
+==================
+
+Here, some common operations that you might need for your CMS are
+described in the context of 100-comment "pages". Once again, the examples are
+in Python.
+
+Post a New Comment
+------------------
+
+In order to post a new comment, you need to ``$push`` the comment onto the
+last page and ``$inc`` that page's  comment count. If the page has more than 100
+comments, you then must will insert a new page as well. This operation starts
+with a reference to the discussion document, and assumes that the discussion
+document has a property that tracks the number of
+pages:
+
+.. code-block:: python
+
+    page = db.comment_pages.find_and_modify(
+        { 'discussion_id': discussion['_id'],
+          'page': discussion['num_pages'] },
+        { '$inc': { 'count': 1 },
+          '$push': {
+              'comments': { 'slug': slug, ... } } },
+        fields={'count':1},
+        upsert=True,
+        new=True )
+
+Note that the ``find_and_modify()`` above is written as an upsert
+operation; if MongoDB doesn't findfind the page number, the ``find_and_modify()``
+will create it for you, initialized with appropriate values for ``count`` and
+``comments``. Since you're limiting the number of comments per page to around
+100, you also need to create new pages as they become necessary:
+
+.. code-block:: python
+
+    if page['count'] > 100:
+        db.discussion.update(
+            { 'discussion_id: discussion['_id'],
+              'num_pages': discussion['num_pages'] },
+            { '$inc': { 'num_pages': 1 } } )
+
+The update here includes the last known number of pages in the query in order to
+ensure that you don't have a race condition where the number of pages is
+double-incremented, resulting in a nearly or totally empty page. If some
+other process has incremented the number of pages in the discussion,
+then update above simply does nothing.
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to efficiently support the ``find_and_modify()`` and ``update()``
+operations above, you need to maintain a compound index on
+(``discussion_id``, ``page``) in the ``comment_pages`` collection:
+
+.. code-block:: python
+
+    >>> db.comment_pages.ensure_index([
+    ...    ('discussion_id', 1), ('page', 1)])
+
+View the (Paginated) Comments for a Discussion
+----------------------------------------------
+
+In order to paginate comments with a fixed page size (i.e. not with the 100-ish
+number of comments on a database "page"), you need to do
+a bit of extra work in Python:
+
+.. code-block:: python
+
+    def find_comments(discussion_id, skip, limit):
+        result = []
+        page_query = db.comment_pages.find(
+            { 'discussion_id': discussion_id },
+            { 'count': 1, 'comments': { '$slice': [ skip, limit ] } })
+        page_query = page_query.sort('page')
+        for page in page_query:
+            result += page['comments']
+            skip = max(0, skip - page['count'])
+            limit -= len(page['comments'])
+            if limit == 0: break
+        return result
+
+Here, the ``$slice`` operator is used to pull out comments from each page,
+but *only* if the ``skip`` requirement is satisfied. An example helps illustrate
+the logic here. Suppose you have 3 pages with 100, 102,
+101, and 22 comments on each. respectively. YOu wish to retrieve comments
+with skip=300 and limit=50. The algorithm proceeds as follows:
+
++-------+-------+-------------------------------------------------------+
+| Skip  | Limit | Discussion                                            |
++=======+=======+=======================================================+
+| 300   | 50    | ``{$slice: [ 300, 50 ] }`` matches nothing in page    |
+|       |       | #1; subtract page #1's ``count`` from ``skip`` and    |
+|       |       | continue.                                             |
++-------+-------+-------------------------------------------------------+
+| 200   | 50    | ``{$slice: [ 200, 50 ] }`` matches nothing in page    |
+|       |       | #2; subtract page #2's ``count`` from ``skip`` and    |
+|       |       | continue.                                             |
++-------+-------+-------------------------------------------------------+
+| 98    | 50    | ``{$slice: [ 98, 50 ] }`` matches 2 comments in page  |
+|       |       | #3; subtract page #3's ``count`` from ``skip``        |
+|       |       | (saturating at 0), subtract 2 from limit, and         |
+|       |       | continue.                                             |
++-------+-------+-------------------------------------------------------+
+| 0     | 48    | ``{$slice: [ 0, 48 ] }`` matches all 22 comments in   |
+|       |       | page #4; subtract 22 from ``limit`` and continue.     |
++-------+-------+-------------------------------------------------------+
+| 0     | 26    | There are no more pages; terminate loop.              |
++-------+-------+-------------------------------------------------------+
+
+Index Support
+~~~~~~~~~~~~~
+
+Since you already have an index on (``discussion_id``, ``page``) in your
+``comment_pages`` collection, MongoDB can satisfy these queries
+efficiently.
+
+Retrieve a Comment Via Slug ("Permalink")
+-----------------------------------------
+
+Suppose you wish to directly retrieve a comment (e.g. *not* requiring
+paging through all preceeding pages of commentary). In this case, you can
+use the slug to find the correct page, and then use the application to
+find the correct comment:
+
+.. code-block:: python
+
+    page = db.comment_pages.find_one(
+        { 'discussion_id': discussion_id,
+          'comments.slug': comment_slug},
+        { 'comments': 1 })
+    for comment in page['comments']:
+        if comment['slug'] = comment_slug:
+            break
+
+Index Support
+~~~~~~~~~~~~~
+
+Here, you'll need a new index on (``discussion_id``, ``comments.slug``) to
+efficiently support retrieving the page number of the comment by slug:
+
+.. code-block:: python
+
+    >>> db.comment_pages.ensure_index([
+    ...    ('discussion_id', 1), ('comments.slug', 1)])
+
+Sharding
+========
+
+In each of the cases above, it's likely that your ``discussion_id`` will at
+least participate in the shard key if you should choose to shard.
+
+In the case of the one document per comment approach, it would be nice
+to use the ``slug`` (or ``full_slug``, in the case of threaded comments) as
+part of the shard key to allow routing of requests by ``slug``:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'comments', {
+    ...     key : { 'discussion_id' : 1, 'full_slug': 1 } })
+    { "collectionsharded" : "comments", "ok" : 1 }
+
+In the case of the fully-embedded comments, of course, the discussion is
+the only thing needed to shard, and its shard key will probably be
+determined by concerns outside the scope of this document.
+
+In the case of hybrid documents, you'll want to use the page number of the
+comment page in the shard key as well as the ``discussion_id`` to allow MongoDB
+to split popular discussions among different shards:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'comment_pages', {
+    ...     key : { 'discussion_id' : 1, 'page': 1 } })
+    { "collectionsharded" : "comment_pages", "ok" : 1 }
+
diff --git a/source/tutorial/usecase/ecommerce-category-hierarchy.txt b/source/tutorial/usecase/ecommerce-category-hierarchy.txt
new file mode 100644
index 00000000000..9b03bae98f8
--- /dev/null
+++ b/source/tutorial/usecase/ecommerce-category-hierarchy.txt
@@ -0,0 +1,249 @@
+==============================
+E-Commerce: Category Hierarchy
+==============================
+
+Problem
+=======
+
+You have a product hierarchy for an e-commerce site that you want to
+query frequently and update somewhat frequently.
+
+Solution Overview
+=================
+
+This solution keeps each category in its own document, along with a list of its
+ancestors. The category hierarchy used in this example will be
+based on different categories of music:
+
+.. figure:: img/ecommerce-category1.png
+   :align: center
+   :alt: Initial category hierarchy
+
+   Initial category hierarchy
+
+Since categories change relatively infrequently, the focus here will be on the
+operations needed to keep the hierarchy up-to-date and less on the performance
+aspects of updating the hierarchy.
+
+Schema Design
+=============
+
+Each category in the hierarchy will be represented by a document. That
+document will be identified by an ``ObjectId`` for internal
+cross-referencing as well as a human-readable name and a url-friendly
+``slug`` property. Additionally, the schema stores an ancestors list along
+with each document to facilitate displaying a category along with all
+its ancestors in a single query.
+
+.. code-block:: javascript
+
+    { "_id" : ObjectId("4f5ec858eb03303a11000002"),
+      "name" : "Modal Jazz",
+      "parent" : ObjectId("4f5ec858eb03303a11000001"),
+      "slug" : "modal-jazz",
+      "ancestors" : [
+             { "_id" : ObjectId("4f5ec858eb03303a11000001"),
+            "slug" : "bop",
+            "name" : "Bop" },
+             { "_id" : ObjectId("4f5ec858eb03303a11000000"),
+               "slug" : "ragtime",
+               "name" : "Ragtime" } ]
+    }
+
+Operations
+==========
+
+Here, the various category manipulations you may need in an ecommerce site are
+described as they would occur using the schema above. The examples use the Python
+programming language and the ``pymongo`` MongoDB driver, but implementations
+would be similar in other languages as well.
+
+Read and Display a Category
+---------------------------
+
+The simplest operation is reading and displaying a hierarchy. In this
+case, you might want to display a category along with a list of "bread
+crumbs" leading back up the hierarchy. In an E-commerce site, you'll
+most likely have the slug of the category available for your query, as it can be
+parsed from the URL.
+
+.. code-block:: python
+
+    category = db.categories.find(
+        {'slug':slug},
+        {'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 })
+
+Here, the slug is used  to retrieve the category, fetching only those
+fields needed for display.
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to support this common operation efficiently, you'll need an index
+on the 'slug' field. Since slug is also intended to be unique, the index over it
+should be unique as well:
+
+.. code-block:: python
+
+    db.categories.ensure_index('slug', unique=True)
+
+Add a Category to the Hierarchy
+-------------------------------
+
+Adding a category to a hierarchy is relatively simple. Suppose you wish
+to add a new category 'Swing' as a child of 'Ragtime':
+
+.. figure:: img/ecommerce-category2.png
+   :align: center
+   :alt: Adding a category
+
+   Adding a category
+
+In this case, the initial insert is simple enough, but after this
+insert, the "Swing" category is still missing its ancestors array. To define
+this, you'll need a helper function to build the ancestor list:
+
+.. code-block:: python
+
+    def build_ancestors(_id, parent_id):
+        parent = db.categories.find_one(
+            {'_id': parent_id},
+            {'name': 1, 'slug': 1, 'ancestors':1})
+        parent_ancestors = parent.pop('ancestors')
+        ancestors = [ parent ] + parent_ancestors
+        db.categories.update(
+            {'_id': _id},
+            {'$set': { 'ancestors': ancestors } })
+
+Note that you only need to travel one level in the hierarchy to get the
+ragtime's ancestors and build swing's entire ancestor list. Now you can
+actually perform the insert and rebuild the ancestor list:
+
+.. code-block:: python
+
+    doc = dict(name='Swing', slug='swing', parent=ragtime_id)
+    swing_id = db.categories.insert(doc)
+    build_ancestors(swing_id, ragtime_id)
+
+Index Support
+~~~~~~~~~~~~~
+
+Since these queries and updates all selected based on ``_id``, you only need
+the default MongoDB-supplied index on ``_id`` to support this operation
+efficiently.
+
+Change the Ancestry of a Category
+---------------------------------
+
+Suppose you wish to reorganize the hierarchy by moving 'bop' under
+'swing':
+
+.. figure:: img/ecommerce-category3.png
+   :align: center
+   :alt: Change the parent of a category
+
+   Change the parent of a category
+
+The initial update is straightforward:
+
+.. code-block:: python
+
+    db.categories.update(
+        {'_id':bop_id}, {'$set': { 'parent': swing_id } } )
+
+Now, you still need to update the ancestor list for bop and all its
+descendants. In this case, you can't guarantee that the ancestor list of
+the parent category is always correct, since MongoDB may
+process the categories out-of-order. To handle this, you'll need a new
+ancestor-building function:
+
+.. code-block:: python
+
+    def build_ancestors_full(_id, parent_id):
+        ancestors = []
+        while parent_id is not None:
+            parent = db.categories.find_one(
+                {'_id': parent_id},
+                {'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1})
+            parent_id = parent.pop('parent')
+            ancestors.append(parent)
+        db.categories.update(
+            {'_id': _id},
+            {'$set': { 'ancestors': ancestors } })
+
+Now, at the expense of a few more queries up the hierarchy, you can
+easily reconstruct all the descendants of 'bop':
+
+.. code-block:: python
+
+    for cat in db.categories.find(
+        {'ancestors._id': bop_id},
+        {'parent_id': 1}):
+        build_ancestors_full(cat['_id'], cat['parent_id'])
+
+Index Support
+~~~~~~~~~~~~~
+
+In this case, an index on ``ancestors._id`` would be helpful in
+determining which descendants need to be updated:
+
+.. code-block:: python
+
+    db.categories.ensure_index('ancestors._id')
+
+Rename a Category
+-----------------
+
+Renaming a category would normally be an extremely quick operation, but
+in this case due to denormalization, you also need to update the
+descendants. Suppose you need to rename "Bop" to "BeBop:"
+
+.. figure:: img/ecommerce-category4.png
+   :align: center
+   :alt: Rename a category
+
+   Rename a category
+
+First, you need to update the category name itself:
+
+.. code-block:: python
+
+    db.categories.update(
+        {'_id':bop_id}, {'$set': { 'name': 'BeBop' } } )
+
+Next, you need to update each descendant's ancestors list:
+
+.. code-block:: python
+
+    db.categories.update(
+        {'ancestors._id': bop_id},
+        {'$set': { 'ancestors.$.name': 'BeBop' } },
+        multi=True)
+
+Here, you can use the positional operation ``$`` to match the exact "ancestor"
+entry that matches the query, as well as the ``multi`` option on the
+update to ensure the rename operation occurs in a single server
+round-trip.
+
+Index Support
+~~~~~~~~~~~~~
+
+In this case, the index you have already defined on ``ancestors._id`` is
+sufficient to ensure good performance.
+
+Sharding
+========
+
+In this solution, it is unlikely that you would want to shard the
+collection since it's likely to be quite small. If you *should* decide to
+shard, the use of an ``_id`` field for most updates makes it an
+ideal sharding candidate. The sharding commands you'd use to shard
+the category collection would then be the following:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'categories')
+    { "collectionsharded" : "categories", "ok" : 1 }
+
+Note that there is no need to specify the shard key, as MongoDB will
+default to using ``_id`` as a shard key.
diff --git a/source/tutorial/usecase/ecommerce-inventory-management.txt b/source/tutorial/usecase/ecommerce-inventory-management.txt
new file mode 100644
index 00000000000..0c1b065a27e
--- /dev/null
+++ b/source/tutorial/usecase/ecommerce-inventory-management.txt
@@ -0,0 +1,392 @@
+================================
+E-Commerce: Inventory Management
+================================
+
+Problem
+=======
+
+You have a product catalog and you would like to maintain an accurate
+inventory count as users shop your online store, adding and removing
+things from their cart.
+
+Solution Overview
+=================
+
+In an ideal world, consumers would begin browsing an online store, add
+items to their shopping cart, and proceed in a timely manner to checkout
+where their credit cards would always be successfully validated and
+charged. In the real world, however, customers often add or remove items
+from their shopping cart, change quantities, abandon the cart, and have
+problems at checkout time.
+
+This solution keeps the traditional metaphor of the shopping cart, but
+the shopping cart will *age* . Once a shopping cart has not been active
+for a certain period of time, all the items in the cart once again
+become part of available inventory and the cart is cleared. The state
+transition diagram for a shopping cart is below:
+
+.. figure:: img/ecommerce-inventory1.png
+   :align: center
+   :alt:
+
+Schema Design
+=============
+
+In your inventory collection, you need to maintain the current available
+inventory of each stock-keeping unit (SKU) as well as a list of 'carted'
+items that may be released back to available inventory if their shopping
+cart times out:
+
+.. code-block:: javascript
+
+    {
+        _id: '00e8da9b',
+        qty: 16,
+        carted: [
+            { qty: 1, cart_id: 42,
+              timestamp: ISODate("2012-03-09T20:55:36Z"), },
+            { qty: 2, cart_id: 43,
+              timestamp: ISODate("2012-03-09T21:55:36Z"), },
+        ]
+    }
+
+(Note that, while in an actual implementation, you might choose to merge
+this schema with the product catalog schema described in
+:doc:`E-Commerce: Product Catalog <ecommerce-product-catalog>`, the inventory
+schema is simplified here for brevity.) Continuing the metaphor of the
+brick-and-mortar store, your SKU above has 16 items on the shelf, 1 in one cart,
+and 2 in another for a total of 19 unsold items of merchandise.
+
+For the shopping cart model, you need to maintain a list of (``sku``,
+``quantity``, ``price``) line items:
+
+.. code-block:: javascript
+
+    {
+        _id: 42,
+        last_modified: ISODate("2012-03-09T20:55:36Z"),
+        status: 'active',
+        items: [
+            { sku: '00e8da9b', qty: 1, item_details: {...} },
+            { sku: '0ab42f88', qty: 4, item_details: {...} }
+        ]
+    }
+
+Note that the cart model includes item details in each line
+item. This allows your app to display the contents of the cart to the user
+without needing a second query back to the catalog collection to fetch
+the details.
+
+Operations
+==========
+
+Here, the various inventory-related operations in an ecommerce site are described
+as they would occur using the schema above. The examples use the Python
+programming language and the ``pymongo`` MongoDB driver, but implementations
+would be similar in other languages as well.
+
+Add an Item to a Shopping Cart
+------------------------------
+
+The most basic operation is moving an item off the "shelf" in to the
+"cart." The constraint is that you would like to guarantee that you never
+move an unavailable item off the shelf into the cart. To solve this
+problem, this solution ensures that inventory is only updated if there is
+sufficient inventory to satisfy the request:
+
+.. code-block:: python
+
+    def add_item_to_cart(cart_id, sku, qty, details):
+        now = datetime.utcnow()
+
+        # Make sure the cart is still active and add the line item
+        result = db.cart.update(
+            {'_id': cart_id, 'status': 'active' },
+            { '$set': { 'last_modified': now },
+              '$push':
+                  'items': {'sku': sku, 'qty':qty, 'details': details }
+            },
+            safe=True)
+        if not result['updatedExisting']:
+            raise CartInactive()
+
+        # Update the inventory
+        result = db.inventory.update(
+            {'_id':sku, 'qty': {'$gte': qty}},
+            {'$inc': {'qty': -qty},
+             '$push': {
+                 'carted': { 'qty': qty, 'cart_id':cart_id,
+                             'timestamp': now } } },
+            safe=True)
+        if not result['updatedExisting']:
+            # Roll back our cart update
+            db.cart.update(
+                {'_id': cart_id },
+                { '$pull': { 'items': {'sku': sku } } }
+            )
+            raise InadequateInventory()
+
+Note here in particular that the system does not trust that the request is
+satisfiable. The first check makes sure that the cart is still "active"
+(more on inactive carts below) before adding a line item. The next check
+verifies that sufficient inventory exists to satisfy the request before
+decrementing inventory. In the case of inadequate inventory, the system
+*compensates* for the non-transactional nature of MongoDB by removing the
+cart update. Using safe=True and checking the result in the case of
+these two updates allows you to report back an error to the user if the
+cart has become inactive or available quantity is insufficient to
+satisfy the request.
+
+Index Support
+~~~~~~~~~~~~~
+
+To support this query efficiently, all you really need is an index on
+``_id``, which MongoDB provides us by default.
+
+Modifying the Quantity in the Cart
+----------------------------------
+
+Here, you'd like to allow the user to adjust the quantity of items in their
+cart. The system must ensure that when they adjust the quantity upward, there
+is sufficient inventory to cover the quantity, as well as updating the
+particular ``carted`` entry for the user's cart.
+
+.. code-block:: python
+
+    def update_quantity(cart_id, sku, old_qty, new_qty):
+        now = datetime.utcnow()
+        delta_qty = new_qty - old_qty
+
+        # Make sure the cart is still active and add the line item
+        result = db.cart.update(
+            {'_id': cart_id, 'status': 'active', 'items.sku': sku },
+            {'$set': {
+                 'last_modified': now,
+                 'items.$.qty': new_qty },
+            },
+            safe=True)
+        if not result['updatedExisting']:
+            raise CartInactive()
+
+        # Update the inventory
+        result = db.inventory.update(
+            {'_id':sku,
+             'carted.cart_id': cart_id,
+             'qty': {'$gte': delta_qty} },
+            {'$inc': {'qty': -delta_qty },
+             '$set': { 'carted.$.qty': new_qty, 'timestamp': now } },
+            safe=True)
+        if not result['updatedExisting']:
+            # Roll back our cart update
+            db.cart.update(
+                {'_id': cart_id, 'items.sku': sku },
+                {'$set': { 'items.$.qty': old_qty }
+            })
+            raise InadequateInventory()
+
+Note in particular here the use of the positional operator '$' to
+update the particular ``carted`` entry and line item that matched for the
+query. This allows the system to update the inventory and keep track of the data
+necessary need to "rollback" the cart in a single atomic operation. The code above
+also ensures the cart is active and timestamp it as in the case of adding
+items to the cart.
+
+Index Support
+~~~~~~~~~~~~~
+
+To support this query efficiently, again all we really need is an index on ``_id``.
+
+Checking Out
+------------
+
+During checkout, you'd like to validate the method of payment and remove
+the various ``carted`` items after the transaction has succeeded.
+
+.. code-block:: python
+
+    def checkout(cart_id):
+        now = datetime.utcnow()
+
+        # Make sure the cart is still active and set to 'pending'. Also
+        #     fetch the cart details so we can calculate the checkout price
+        cart = db.cart.find_and_modify(
+            {'_id': cart_id, 'status': 'active' },
+            update={'$set': { 'status': 'pending','last_modified': now } } )
+        if cart is None:
+            raise CartInactive()
+
+        # Validate payment details; collect payment
+        try:
+            collect_payment(cart)
+            db.cart.update(
+                {'_id': cart_id },
+                {'$set': { 'status': 'complete' } } )
+            db.inventory.update(
+                {'carted.cart_id': cart_id},
+                {'$pull': {'cart_id': cart_id} },
+                multi=True)
+        except:
+            db.cart.update(
+                {'_id': cart_id },
+                {'$set': { 'status': 'active' } } )
+            raise
+
+Here, the cart is first "locked" by setting its status to "pending"
+(disabling any modifications.) Then the system collects payment data, verifying
+at the same time that the cart is still active. MongoDB's
+``findAndModify`` command is used to atomically update the cart and return its
+details so you can capture payment information. If the payment is
+successful, you then remove the ``carted`` items from individual items'
+inventory and set the cart to "complete." If payment is unsuccessful, you
+unlock the cart by setting its status back to "active" and report a
+payment error.
+
+Index Support
+~~~~~~~~~~~~~
+
+Once again the ``_id`` default index is enough to make this operation efficient.
+
+Returning Timed-Out Items to Inventory
+--------------------------------------
+
+Periodically, you'd like to expire carts that have been inactive for a
+given number of seconds, returning their line items to available
+inventory:
+
+.. code-block:: python
+
+    def expire_carts(timeout):
+        now = datetime.utcnow()
+        threshold = now - timedelta(seconds=timeout)
+
+        # Lock and find all the expiring carts
+        db.cart.update(
+            {'status': 'active', 'last_modified': { '$lt': threshold } },
+            {'$set': { 'status': 'expiring' } },
+            multi=True )
+
+        # Actually expire each cart
+        for cart in db.cart.find({'status': 'expiring'}):
+
+            # Return all line items to inventory
+            for item in cart['items']:
+                db.inventory.update(
+                    { '_id': item['sku'],
+                      'carted.cart_id': cart['id'],
+                      'carted.qty': item['qty']
+                    },
+                    {'$inc': { 'qty': item['qty'] },
+                     '$pull': { 'carted': { 'cart_id': cart['id'] } } })
+
+            db.cart.update(
+                {'_id': cart['id'] },
+                {'$set': { status': 'expired' })
+
+Here, you first find all carts to be expired and then, for each cart,
+return its items to inventory. Once all items have been returned to
+inventory, the cart is moved to the 'expired' state.
+
+Index Support
+~~~~~~~~~~~~~
+
+In this case, you need to be able to efficiently query carts based on
+their ``status`` and ``last_modified`` values, so an index on these would help
+the performance of the periodic expiration process:
+
+.. code-block:: python
+
+    >>> db.cart.ensure_index([('status', 1), ('last_modified', 1)])
+
+Note in particular the order in which the index is defined: in order to
+efficiently support range queries ('$lt' in this case), the ranged item
+must be the last item in the index. Also note that there is no need to
+define an index on the ``status`` field alone, as any queries for status
+can use the compound index we have defined here.
+
+Error Handling
+--------------
+
+There is one failure mode above that thusfar has not been handled adequately: the
+case of an exception that occurs after updating the inventory collection
+but before updating the shopping cart. The result of this failure mode
+is a shopping cart that may be absent or expired where the 'carted'
+items in the inventory have not been returned to available inventory. To
+account for this case, you'll need to run a cleanup method periodically that
+will find old ``carted`` items and check the status of their cart:
+
+.. code-block:: python
+
+    def cleanup_inventory(timeout):
+        now = datetime.utcnow()
+        threshold = now - timedelta(seconds=timeout)
+
+        # Find all the expiring carted items
+        for item in db.inventory.find(
+            {'carted.timestamp': {'$lt': threshold }}):
+
+            # Find all the carted items that matched
+                  carted = dict(
+                      (carted_item['cart_id'], carted_item)
+                      for carted_item in item['carted']
+                      if carted_item['timestamp'] < threshold)
+
+            # Find any carts that are active and refresh the carted items
+            for cart in db.cart.find(
+                { '_id': {'$in': carted.keys() },
+                  'status':'active'}):
+                cart = carted[cart['_id']]
+                db.inventory.update(
+                    { '_id': item['_id'],
+                      'carted.cart_id': cart['_id'] },
+                    { '$set': {'carted.$.timestamp': now } })
+                del carted[cart['_id']]
+
+            # All the carted items left in the dict need to now be
+            #    returned to inventory
+            for cart_id, carted_item in carted.items():
+                db.inventory.update(
+                    { '_id': item['_id'],
+                      'carted.cart_id': cart_id,
+                      'carted.qty': carted_item['qty'] },
+                    { '$inc': { 'qty': carted_item['qty'] },
+                      '$pull': { 'carted': { 'cart_id': cart_id } } })
+
+Note that the function above is safe, as it checks to be sure the cart
+is expired or expiring before removing items from the cart and returning
+them to inventory. This function could, however, be slow as well as
+slowing down other updates and queries, so it should be used
+infrequently.
+
+Sharding
+========
+
+If you choose to shard this system, the use of an ``_id`` field for most of
+our updates makes ``_id`` an ideal sharding candidate, for both carts and
+products. Using ``_id`` as your shard key allows all updates that query on
+``_id`` to be routed to a single mongod process. There are two potential
+drawbacks with using ``_id`` as a shard key, however.
+
+-  If the cart collection's ``_id`` is generated in a generally increasing
+   order, new carts will all initially be assigned to a single shard.
+-  Cart expiration and inventory adjustment requires several broadcast
+   queries and updates if ``_id`` is used as a shard key.
+
+It turns out you can mitigate the first pitfall by choosing a random
+value (perhaps the sha-1 hash of an ``ObjectId``) as the ``_id`` of each cart
+as it is created. The second objection is valid, but relatively
+unimportant, as the expiration function runs relatively infrequently and can be
+slowed down by the judicious use of ``sleep()`` calls in order to
+minimize server load.
+
+The sharding commands you'd use to shard the cart and inventory
+collections, then, would be the following:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'inventory')
+    { "collectionsharded" : "inventory", "ok" : 1 }
+    >>> db.command('shardcollection', 'cart')
+    { "collectionsharded" : "cart", "ok" : 1 }
+
+Note that there is no need to specify the shard key, as MongoDB will
+default to using ``_id`` as a shard key.
diff --git a/source/tutorial/usecase/ecommerce-product-catalog.txt b/source/tutorial/usecase/ecommerce-product-catalog.txt
new file mode 100644
index 00000000000..d225bfd2b67
--- /dev/null
+++ b/source/tutorial/usecase/ecommerce-product-catalog.txt
@@ -0,0 +1,516 @@
+===========================
+E-Commerce: Product Catalog
+===========================
+
+Problem
+=======
+
+You have a product catalog that you would like to store in MongoDB with
+products of various types and various relevant attributes.
+
+Solution Overview
+=================
+
+In the relational database world, there are several solutions of varying
+performance characteristics used to solve this problem. This section
+examines a few options and then describes the solution enabled by MongoDB.
+
+One approach ("concrete table inheritance") to solving this problem is
+to create a table for each product category:
+
+.. code-block:: sql
+
+    CREATE TABLE `product_audio_album` (
+        `sku` char(8) NOT NULL,
+        ...
+        `artist` varchar(255) DEFAULT NULL,
+        `genre_0` varchar(255) DEFAULT NULL,
+        `genre_1` varchar(255) DEFAULT NULL,
+        ...,
+        PRIMARY KEY(`sku`))
+    ...
+    CREATE TABLE `product_film` (
+        `sku` char(8) NOT NULL,
+        ...
+        `title` varchar(255) DEFAULT NULL,
+        `rating` char(8) DEFAULT NULL,
+        ...,
+        PRIMARY KEY(`sku`))
+    ...
+
+The main problem with this approach is a lack of flexibility. Each time
+you add a new product category, you need to create a new table.
+Furthermore, queries must be tailored to the exact type of product
+expected.
+
+Another approach ("single table inheritance") is to use a single
+table for all products and add new columns each time you need to store
+a new type of product:
+
+.. code-block:: sql
+
+    CREATE TABLE `product` (
+        `sku` char(8) NOT NULL,
+        ...
+        `artist` varchar(255) DEFAULT NULL,
+        `genre_0` varchar(255) DEFAULT NULL,
+        `genre_1` varchar(255) DEFAULT NULL,
+        ...
+        `title` varchar(255) DEFAULT NULL,
+        `rating` char(8) DEFAULT NULL,
+        ...,
+        PRIMARY KEY(`sku`))
+
+This is more flexible, allowing queries to span different types of
+product, but it's quite wasteful of space. One possible space
+optimization would be to name the columns generically (``str_0``, ``str_1``,
+etc.,) but then you lose visibility into the meaning of the actual data in
+the columns.
+
+Multiple table inheritance is yet another approach where common attributes are
+represented in a generic 'product' table and the variations in individual
+category product tables:
+
+.. code-block:: sql
+
+    CREATE TABLE `product` (
+        `sku` char(8) NOT NULL,
+        `title` varchar(255) DEFAULT NULL,
+        `description` varchar(255) DEFAULT NULL,
+        `price`, ...
+        PRIMARY KEY(`sku`))
+
+    CREATE TABLE `product_audio_album` (
+        `sku` char(8) NOT NULL,
+        ...
+        `artist` varchar(255) DEFAULT NULL,
+        `genre_0` varchar(255) DEFAULT NULL,
+        `genre_1` varchar(255) DEFAULT NULL,
+        ...,
+        PRIMARY KEY(`sku`),
+        FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
+    ...
+    CREATE TABLE `product_film` (
+        `sku` char(8) NOT NULL,
+        ...
+        `title` varchar(255) DEFAULT NULL,
+        `rating` char(8) DEFAULT NULL,
+        ...,
+        PRIMARY KEY(`sku`),
+        FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
+    ...
+
+This is more space-efficient than single-table inheritance and somewhat
+more flexible than concrete-table inheritance, but it does require a
+minimum of one join to actually obtain all the attributes relevant to a
+product.
+
+Entity-attribute-value schemas are yet another solution, basically
+creating a meta-model for your product data. In this approach, you
+maintain a table with (``entity_id``, ``attribute_id``, ``value``) triples that
+describe each product. For instance, suppose you are describing an audio
+album. In that case you might have a series of rows representing the
+following relationships:
+
++-----------------+-------------+------------------+
+| Entity          | Attribute   | Value            |
++=================+=============+==================+
+| sku_00e8da9b    | type        | Audio Album      |
++-----------------+-------------+------------------+
+| sku_00e8da9b    | title       | A Love Supreme   |
++-----------------+-------------+------------------+
+| sku_00e8da9b    | ...         | ...              |
++-----------------+-------------+------------------+
+| sku_00e8da9b    | artist      | John Coltrane    |
++-----------------+-------------+------------------+
+| sku_00e8da9b    | genre       | Jazz             |
++-----------------+-------------+------------------+
+| sku_00e8da9b    | genre       | General          |
++-----------------+-------------+------------------+
+| ...             | ...         | ...              |
++-----------------+-------------+------------------+
+
+This schema has the advantage of being completely flexible; any entity
+can have any set of any attributes. New product categories do not
+require *any* changes in the DDL for your database. The downside to this
+schema is that any nontrivial query requires large numbers of join
+operations, which results in a large performance penalty.
+
+One other approach that has been used in relational world is to "punt"
+so to speak on the product details and serialize them all into a ``BLOB``
+column. The problem with this approach is that the details become
+difficult to search and sort by. (One exception is with Oracle's ``XMLTYPE``
+columns, which actually resemble a NoSQL document database.)
+
+The approach best suited to MongoDB is to use a single collection to store all
+the product data, similar to single-table inheritance. Due to MongoDB's
+dynamic schema, however, you need not conform each document to the same
+schema. This allows  you to tailor each product's document to only contain
+attributes relevant to that product category.
+
+Schema Design
+=============
+
+Your schema should contain general product information that needs to be
+searchable across all products at the beginning of each document, with
+properties that vary from category to category encapsulated in a
+'details' property. Thus an audio album might look like the following:
+
+.. code-block:: javascript
+
+    {
+      sku: "00e8da9b",
+      type: "Audio Album",
+      title: "A Love Supreme",
+      description: "by John Coltrane",
+      asin: "B0000A118M",
+
+      shipping: {
+        weight: 6,
+        dimensions: {
+          width: 10,
+          height: 10,
+          depth: 1
+        },
+      },
+
+      pricing: {
+        list: 1200,
+        retail: 1100,
+        savings: 100,
+        pct_savings: 8
+      },
+
+      details: {
+        title: "A Love Supreme [Original Recording Reissued]",
+        artist: "John Coltrane",
+        genre: [ "Jazz", "General" ],
+            ...
+        tracks: [
+          "A Love Supreme Part I: Acknowledgement",
+          "A Love Supreme Part II - Resolution",
+          "A Love Supreme, Part III: Pursuance",
+          "A Love Supreme, Part IV-Psalm"
+        ],
+      },
+    }
+
+A movie title would have the same fields stored for general product
+information, shipping, and pricing, but have quite a different details
+attribute:
+
+.. code-block:: javascript
+
+    {
+      sku: "00e8da9d",
+      type: "Film",
+      ...,
+      asin: "B000P0J0AQ",
+
+      shipping: { ... },
+
+      pricing: { ... },
+
+      details: {
+        title: "The Matrix",
+        director: [ "Andy Wachowski", "Larry Wachowski" ],
+        writer: [ "Andy Wachowski", "Larry Wachowski" ],
+        ...,
+        aspect_ratio: "1.66:1"
+      },
+    }
+
+Another thing to note in the MongoDB schema is that you can have
+multi-valued attributes without any arbitrary restriction on the number
+of attributes (as you might have if you had ``genre_0`` and ``genre_1``
+columns in a relational database, for instance) and without the need for
+a join (as you might have if you normalized the many-to-many "genre"
+relation).
+
+Operations
+==========
+
+You'll be primarily using the product catalog mainly to perform search
+operations. Thus the focus in this section will be on the various types
+of queries you might want to support in an e-commerce site. These
+examples will be written in the Python programming language using the
+``pymongo`` driver, but other language/driver combinations should be
+similar.
+
+Find All Jazz Albums, Sorted by Year Produced
+---------------------------------------------
+
+Here, you'd like to see a group of products with a particular genre,
+sorted by the year in which they were produced:
+
+.. code-block:: python
+
+    query = db.products.find({'type':'Audio Album',
+                              'details.genre': 'jazz'})
+    query = query.sort([('details.issue_date', -1)])
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to efficiently support this type of query, you need to create a
+compound index on all the properties used in the filter and in the sort:
+
+.. code-block:: python
+
+    db.products.ensure_index([
+        ('type', 1),
+        ('details.genre', 1),
+        ('details.issue_date', -1)])
+
+Note here that the final component of the index is the sort field. This allows
+MongoDB to traverse the index in the order in which the data is to be returned,
+rather than performing a slow in-memory sort of the data.
+
+Find All Products Sorted by Percentage Discount Descending
+----------------------------------------------------------
+
+While most searches would be for a particular type of product (audio
+album or movie, for instance), there may be cases where you'd like to
+find all products in a certain price range, perhaps for a "best daily
+deals" of your website. In this case, you'll use the pricing information
+that exists in all products to find the products with the highest
+percentage discount:
+
+.. code-block:: python
+
+    query = db.products.find( { 'pricing.pct_savings': {'$gt': 25 })
+    query = query.sort([('pricing.pct_savings', -1)])
+
+Index Support
+~~~~~~~~~~~~~
+
+In order to efficiently support this type of query, you'll need an index on the
+percentage savings:
+
+.. code-block:: python
+
+    db.products.ensure_index('pricing.pct_savings')
+
+Since the index is only on a single key, it does not matter in which
+order the index is sorted. Note that, had you wanted to perform a range
+query (say all products over $25 retail) and sort by another property
+(perhaps percentage savings), MongoDB would not have been able to use an
+index as effectively. Range queries or sorts must always be the *last*
+property in a compound index in order to avoid scanning entirely. Thus
+using a different property for a range query and a sort requires some
+degree of scanning, slowing down your query.
+
+Find All Movies in Which Keanu Reeves Acted
+-------------------------------------------
+
+In this case, you want to search inside the details of a particular type
+of product (a movie) to find all movies containing Keanu Reeves, sorted
+by date descending:
+
+.. code-block:: python
+
+    query = db.products.find({'type': 'Film',
+                              'details.actor': 'Keanu Reeves'})
+    query = query.sort([('details.issue_date', -1)])
+
+Index Support
+~~~~~~~~~~~~~
+
+Here, you wish to once again index by type first, followed the details
+you're interested in:
+
+.. code-block:: python
+
+    db.products.ensure_index([
+        ('type', 1),
+        ('details.actor', 1),
+        ('details.issue_date', -1)])
+
+And once again, the final component of the index is the sort field.
+
+Find All Movies With the Word "Hacker" in the Title
+---------------------------------------------------
+
+Those experienced with relational databases may shudder at this
+operation, since it implies an inefficient LIKE query. In fact, without
+a full-text search engine, some scanning will always be required to
+satisfy this query. In the case of MongoDB, the solution is to use a regular
+expression. In Python, you can use the ``re`` module to construct the query:
+
+.. code-block:: python
+
+    import re
+    re_hacker = re.compile(r'.*hacker.*', re.IGNORECASE)
+
+
+    query = db.products.find({'type': 'Film', 'title': re_hacker})
+    query = query.sort([('details.issue_date', -1)])
+
+Although this is fairly convenient, MongoDB also provides the option to
+use a special syntax rather than importing the Python ``re`` module:
+
+.. code-block:: python
+
+    query = db.products.find({
+        'type': 'Film',
+        'title': {'$regex': '.*hacker.*', '$options':'i'}})
+    query = query.sort([('details.issue_date', -1)])
+
+Index Support
+~~~~~~~~~~~~~
+
+Here, the best index diverges a bit from the previous index orders:
+
+.. code-block:: python
+
+    db.products.ensure_index([
+        ('type', 1),
+        ('details.issue_date', -1),
+        ('title', 1)])
+
+You may be wondering why you should include the title field in the index
+if MongoDB has to scan anyway. The reason is that there are two types of
+scans: index scans and document scans. Document scans require entire
+documents to be loaded into memory, while index scans only require index
+entries to be loaded. So while an index scan on title isn't as efficient
+as a direct lookup, it is certainly faster than a document scan.
+
+The order in which you include the index keys is also different than what
+you might expect. This is once again due to the fact that you're
+scanning. Since the results need to be in sorted order by
+``'details.issue_date``, you should make sure that's the order in which
+MongoDB scans titles. You can observe the difference looking at the
+query plans for different orderings. If you use the (``type``, ``title``,
+``details.issue_date``) index, you get the following plan:
+
+.. code-block:: python
+   :emphasize-lines: 11,17
+
+    {u'allPlans': [...],
+     u'cursor': u'BtreeCursor type_1_title_1_details.issue_date_-1 multi',
+     u'indexBounds': {u'details.issue_date': [[{u'$maxElement': 1},
+                                               {u'$minElement': 1}]],
+                      u'title': [[u'', {}],
+                                 [<_sre.SRE_Pattern object at 0x2147cd8>,
+                                  <_sre.SRE_Pattern object at 0x2147cd8>]],
+                      u'type': [[u'Film', u'Film']]},
+     u'indexOnly': False,
+     u'isMultiKey': False,
+     u'millis': 208,
+     u'n': 0,
+     u'nChunkSkips': 0,
+     u'nYields': 0,
+     u'nscanned': 10000,
+     u'nscannedObjects': 0,
+     u'scanAndOrder': True}
+
+If, however, you use the (``type``, ``details.issue_date``, ``title``) index, you get
+the following plan:
+
+.. code-block:: python
+   :emphasize-lines: 11
+
+    {u'allPlans': [...],
+     u'cursor': u'BtreeCursor type_1_details.issue_date_-1_title_1 multi',
+     u'indexBounds': {u'details.issue_date': [[{u'$maxElement': 1},
+                                               {u'$minElement': 1}]],
+                      u'title': [[u'', {}],
+                                 [<_sre.SRE_Pattern object at 0x2147cd8>,
+                                  <_sre.SRE_Pattern object at 0x2147cd8>]],
+                      u'type': [[u'Film', u'Film']]},
+     u'indexOnly': False,
+     u'isMultiKey': False,
+     u'millis': 157,
+     u'n': 0,
+     u'nChunkSkips': 0,
+     u'nYields': 0,
+     u'nscanned': 10000,
+     u'nscannedObjects': 0}
+
+The two salient features to note are a) the absence of the
+``scanAndOrder: True`` in the optmal query and b) the difference in time
+(208ms for the suboptimal query versus 157ms for the optimal one). The
+lesson learned here is that if you absolutely have to scan, you should
+make the elements you're scanning the *least* significant part of the
+index (even after the sort).
+
+Sharding
+========
+
+Though the performance in this system is highly dependent on the indexes,
+sharding can enhance that performance further by allowing
+MongoDB to keep larger portions of those indexes in RAM. In order to maximize
+your read scaling, it's also nice to choose a shard key that allows
+mongos to route queries to only one or a few shards rather than all the
+shards globally.
+
+Since most of the queries in this system include type, it should probably be
+included in the shard key. You may note that most of the
+queries also included ``details.issue_date``, so there may be a
+temptation to include it in the shard key, but this actually wouldn't
+help much since none of the queries were *selective* by date.
+
+Since this schema is so flexible, it's hard to say *a priori* what the
+ideal shard key would be, but a reasonable guess would be to include the
+``type`` field, one or more detail fields that are commonly queried, and
+one final random-ish field to ensure you don't get large unsplittable
+chunks. For this example, assuming that ``details.genre`` is the
+second-most queried field after ``type``, the sharding setup
+would be as follows:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'product', {
+    ...     key : { 'type': 1, 'details.genre' : 1, 'sku':1 } })
+    { "collectionsharded" : "details.genre", "ok" : 1 }
+
+One important note here is that, even if you choose a shard key that
+requires all queries to be broadcast to all shards, you still get some
+benefits from sharding due to a) the larger amount of memory available
+to store indexes and b) the fact that searches will be parallelized
+across shards, reducing search latency.
+
+Scaling Queries with ``read_preference``
+----------------------------------------
+
+Although sharding is the best way to scale reads and writes, it's not
+always possible to partition your data so that the queries can be routed
+by mongos to a subset of shards. In this case, ``mongos`` will broadcast the
+query to all shards and then accumulate the results before returning to
+the client. In cases like this, you can still scale query performance
+by allowing ``mongos`` to read from the secondary servers in a replica set.
+This is achieved via the ``read_preference`` argument, and can be set at
+the connection or individual query level. For instance, to allow all
+reads on a connection to go to a secondary, the syntax is:
+
+.. code-block:: python
+
+    conn = pymongo.Connection(read_preference=pymongo.SECONDARY)
+
+or
+
+.. code-block:: python
+
+    conn = pymongo.Connection(read_preference=pymongo.SECONDARY_ONLY)
+
+In the first instance, reads will be distributed among all the
+secondaries and the primary, whereas in the second reads will only be
+sent to the secondary. To allow queries to go to a secondary on a
+per-query basis, you can also specify a ``read_preference``:
+
+.. code-block:: python
+
+    results = db.product.find(..., read_preference=pymongo.SECONDARY)
+
+or
+
+.. code-block:: python
+
+    results = db.product.find(..., read_preference=pymongo.SECONDARY_ONLY)
+
+It is important to note that reading from a secondary can introduce a
+lag between when inserts and updates occur and when they become visible
+to queries. In the case of a product catalog, however, where queries
+happen frequently and updates happen infrequently, such eventual
+consistency (updates visible within a few seconds but not immediately)
+is usually tolerable.
diff --git a/source/tutorial/usecase/img/ecommerce-category1.png b/source/tutorial/usecase/img/ecommerce-category1.png
new file mode 100644
index 00000000000..0483e3785be
Binary files /dev/null and b/source/tutorial/usecase/img/ecommerce-category1.png differ
diff --git a/source/tutorial/usecase/img/ecommerce-category2.png b/source/tutorial/usecase/img/ecommerce-category2.png
new file mode 100644
index 00000000000..399f29b81d6
Binary files /dev/null and b/source/tutorial/usecase/img/ecommerce-category2.png differ
diff --git a/source/tutorial/usecase/img/ecommerce-category3.png b/source/tutorial/usecase/img/ecommerce-category3.png
new file mode 100644
index 00000000000..472fb06555e
Binary files /dev/null and b/source/tutorial/usecase/img/ecommerce-category3.png differ
diff --git a/source/tutorial/usecase/img/ecommerce-category4.png b/source/tutorial/usecase/img/ecommerce-category4.png
new file mode 100644
index 00000000000..6ec01d17af0
Binary files /dev/null and b/source/tutorial/usecase/img/ecommerce-category4.png differ
diff --git a/source/tutorial/usecase/img/ecommerce-inventory1.png b/source/tutorial/usecase/img/ecommerce-inventory1.png
new file mode 100644
index 00000000000..3ac5a359de1
Binary files /dev/null and b/source/tutorial/usecase/img/ecommerce-inventory1.png differ
diff --git a/source/tutorial/usecase/img/rta-hierarchy1.png b/source/tutorial/usecase/img/rta-hierarchy1.png
new file mode 100644
index 00000000000..68cf17dfcb6
Binary files /dev/null and b/source/tutorial/usecase/img/rta-hierarchy1.png differ
diff --git a/source/tutorial/usecase/img/rta-preagg1.png b/source/tutorial/usecase/img/rta-preagg1.png
new file mode 100644
index 00000000000..8a5b2d27532
Binary files /dev/null and b/source/tutorial/usecase/img/rta-preagg1.png differ
diff --git a/source/tutorial/usecase/img/rta-preagg2.png b/source/tutorial/usecase/img/rta-preagg2.png
new file mode 100644
index 00000000000..a8af8112f0e
Binary files /dev/null and b/source/tutorial/usecase/img/rta-preagg2.png differ
diff --git a/source/tutorial/usecase/index.txt b/source/tutorial/usecase/index.txt
new file mode 100644
index 00000000000..b517cf60214
--- /dev/null
+++ b/source/tutorial/usecase/index.txt
@@ -0,0 +1,14 @@
+Use Cases
+============
+
+.. toctree::
+   :maxdepth: 1
+
+   real-time-analytics-storing-log-data
+   real-time-analytics-preaggregated-reports
+   real-time-analytics-hierarchical-aggregation
+   ecommerce-product-catalog
+   ecommerce-inventory-management
+   ecommerce-category-hierarchy
+   cms-metadata-and-asset-management
+   cms-storing-comments
diff --git a/source/tutorial/usecase/real-time-analytics-hierarchical-aggregation.txt b/source/tutorial/usecase/real-time-analytics-hierarchical-aggregation.txt
new file mode 100644
index 00000000000..7f149827e48
--- /dev/null
+++ b/source/tutorial/usecase/real-time-analytics-hierarchical-aggregation.txt
@@ -0,0 +1,513 @@
+=============================================
+Real Time Analytics: Hierarchical Aggregation
+=============================================
+
+Problem
+=======
+
+You have a large amount of event data that you want to analyze at
+multiple levels of aggregation.
+
+Solution Overview
+=================
+
+This solution assumes that the incoming event data is already
+stored in an incoming ``events`` collection. For details on how you might
+get the event data into the events collection, please see :doc:`Real Time
+Analytics: Storing Log Data <real-time-analytics-storing-log-data>`.
+
+Once the event data is in the events collection, you need to aggregate
+event data to the finest time granularity you're interested in. Once that
+data is aggregated, you'll use it to aggregate up to the next level of
+the hierarchy, and so on. To perform the aggregations, you'll use
+MongoDB's ``mapreduce`` command. The schema will use several collections:
+the raw data (event) logs and collections for statistics aggregated
+hourly, daily, weekly, monthly, and yearly. This solution uses a hierarchical
+approach to running your map-reduce jobs. The input and output of each
+job is illustrated below:
+
+.. figure:: img/rta-hierarchy1.png
+   :align: center
+   :alt: Hierarchy
+
+   Hierarchy of statistics collected
+
+Note that the events rolling into the hourly collection is qualitatively
+different than the hourly statistics rolling into the daily collection.
+
+.. note::
+
+    **Map/reduce** is a popular aggregation algorithm that is optimized for
+    embarrassingly parallel problems. The psuedocode (in Python) of the
+    map/reduce algorithm appears below. Note that this psuedocode is for a
+    particular type of map/reduce where the results of the
+    map/reduce operation are *reduced* into the result collection, allowing
+    you to perform incremental aggregation which you'll need in this case.
+
+    .. code-block:: python
+
+        def map_reduce(icollection, query,
+            mapf, reducef, finalizef, ocollection):
+            '''Psuedocode for map/reduce with output type="reduce" in MongoDB'''
+            map_results = defaultdict(list)
+            def emit(key, value):
+                '''helper function used inside mapf'''
+                map_results[key].append(value)
+
+
+            # The map phase
+            for doc in icollection.find(query):
+                mapf(doc)
+
+
+            # Pull in documents from the output collection for
+            # output type='reduce'
+            for doc in ocollection.find({'_id': {'$in': map_results.keys() } }):
+                map_results[doc['_id']].append(doc['value'])
+
+
+            # The reduce phase
+            for key, values in map_results.items():
+                reduce_results[key] = reducef(key, values)
+
+
+            # Finalize and save the results back
+            for key, value in reduce_results.items():
+                final_value = finalizef(key, value)
+                ocollection.save({'_id': key, 'value': final_value})
+
+    The embarrassingly parallel part of the map/reduce algorithm lies in the
+    fact that each invocation of mapf, reducef, and finalizef are
+    independent of each other and can, in fact, be distributed to different
+    servers. In the case of MongoDB, this parallelism can be achieved by
+    using sharding on the collection on you're are performing map/reduce.
+
+Schema Design
+=============
+
+When designing the schema for event storage, it's important to track whichevents
+which have been included in your aggregations and events which have not yet been
+included. A simple approach in a relational database would be to use an auto-increment
+integer primary key, but this introduces a big performance penalty to
+your event logging process as it has to fetch event keys one-by one.
+
+If you're able to batch up your inserts into the event table, you can
+still use an auto-increment primary key by using the ``find_and_modify``
+command to generate your ``_id`` values:
+
+.. code-block:: python
+
+    >>> obj = db.my_sequence.find_and_modify(
+    ...     query={'_id':0},
+    ...     update={'$inc': {'inc': 50}}
+    ...     upsert=True,
+    ...     new=True)
+    >>> batch_of_ids = range(obj['inc']-50, obj['inc'])
+
+In most cases, however, it's sufficient to include a timestamp with
+each event that you can use as a marker of which events have been
+processed and which ones remain to be processed.
+
+This use case assumes that you
+are calculating average session length for
+logged-in users on a website. Your event format will thus be the
+following:
+
+.. code-block:: javascript
+
+    {
+        "userid": "rick",
+        "ts": ISODate('2010-10-10T14:17:22Z'),
+        "length":95
+    }
+
+You want to calculate total and average session times for each user at
+the hour, day, week, month, and year. In each case, you will also store
+the number of sessions to enable MongoDB to incrementally recompute the
+average session times. Each of your aggregate documents, then, looks like
+the following:
+
+.. code-block:: javascript
+
+    {
+       _id: { u: "rick", d: ISODate("2010-10-10T14:00:00Z") },
+       value: {
+           ts: ISODate('2010-10-10T15:01:00Z'),
+           total: 254,
+           count: 10,
+           mean: 25.4 }
+    }
+
+Note in particular the timestamp field in the aggregate document. This allows you
+to incrementally update the various levels of the hierarchy.
+
+Operations
+==========
+
+In the discussion below, it is assumed that all the events have been
+inserted and appropriately timestamped, so your main operations are
+aggregating from events into the smallest aggregate (the hourly totals)
+and aggregating from smaller granularity to larger granularity. In each
+case, the last time the particular aggregation is run is stored in a ``last_run``
+variable. (This variable might be loaded from MongoDB or another persistence
+mechanism.)
+
+Aggregate From Events to the Hourly Level
+-----------------------------------------
+
+Here, you want to load all the events since your last run until one minute
+ago (to allow for some lag in logging events). The first thing you
+need to do is create your map function. Even though this solution uses Python
+and ``pymongo`` to interface with the MongoDB server, note that the various
+functions (``mapf``, ``reducef``, and ``finalizef``) that is passed to the
+``mapreduce`` command must be Javascript functions. The map function appears below:
+
+.. code-block:: python
+
+    mapf_hour = bson.Code('''function() {
+        var key = {
+            u: this.userid,
+            d: new Date(
+                this.ts.getFullYear(),
+                this.ts.getMonth(),
+                this.ts.getDate(),
+                this.ts.getHours(),
+                0, 0, 0);
+        emit(
+            key,
+            {
+                total: this.length,
+                count: 1,
+                mean: 0,
+                ts: new Date(); });
+    }''')
+
+In this case, it emits key, value pairs which contain the
+statistics you want to aggregate as you'd expect, but it also emits a `ts`
+value. This will be used in the cascaded aggregations
+(hour to day, etc.) to determine when a particular hourly aggregation
+was performed.
+
+The reduce function is also fairly straightforward:
+
+.. code-block:: python
+
+    reducef = bson.Code('''function(key, values) {
+        var r = { total: 0, count: 0, mean: 0, ts: null };
+        values.forEach(function(v) {
+            r.total += v.total;
+            r.count += v.count;
+        });
+        return r;
+    }''')
+
+A few things are notable here. First of all, note that the returned
+document from the reduce function has the same format as the result of
+map. This is a characteristic of map/reduce that it's nice
+to maintain, as differences in structure between map, reduce, and
+finalize results can lead to difficult-to-debug errors. Also note that
+the ``mean`` and ``ts`` values are ignored in the ``reduce`` function. These will be
+computed in the 'finalize' step:
+
+.. code-block:: python
+
+    finalizef = bson.Code('''function(key, value) {
+        if(value.count > 0) {
+            value.mean = value.total / value.count;
+        }
+        value.ts = new Date();
+        return value;
+    }''')
+
+The finalize function computes the mean value as well as the timestamp you'll
+use to write back to the output collection. Now, to bind it all together, here
+is the Python code to invoke the ``mapreduce`` command:
+
+.. code-block:: python
+
+    cutoff = datetime.utcnow() - timedelta(seconds=60)
+    query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
+
+
+    db.events.map_reduce(
+        map=mapf_hour,
+        reduce=reducef,
+        finalize=finalizef,
+        query=query,
+        out={ 'reduce': 'stats.hourly' })
+
+
+    last_run = cutoff
+
+Through the use you the 'reduce' option on your output, you can safely run this
+aggregation as often as you like so long as you update the ``last_run`` variable
+each time.
+
+Index Support
+~~~~~~~~~~~~~
+
+Since you'll be running the initial query on the input events
+frequently, you'd benefit significantly from an index on the
+timestamp of incoming events:
+
+.. code-block:: python
+
+    >>> db.stats.hourly.ensure_index('ts')
+
+Since you're always reading and writing the most recent events, this
+index has the advantage of being right-aligned, which basically means MongoDB
+only needs a thin slice of the index (the most recent values) in RAM to
+achieve good performance.
+
+Aggregate from Hour to Day
+--------------------------
+
+In calculating the daily statistics, you'll use the hourly statistics
+as input. The daily  map function looks quite similar to the hourly map
+function:
+
+.. code-block:: python
+
+    mapf_day = bson.Code('''function() {
+        var key = {
+            u: this._id.u,
+            d: new Date(
+                this._id.d.getFullYear(),
+                this._id.d.getMonth(),
+                this._id.d.getDate(),
+                0, 0, 0, 0) };
+        emit(
+            key,
+            {
+                total: this.value.total,
+                count: this.value.count,
+                mean: 0,
+                ts: null });
+    }''')
+
+There are a few differences to note here. First of all, the aggregation key is
+the (userid, date) rather than (userid, hour) to allow
+for daily aggregation. Secondly, note that the keys and values ``emit``\ ted
+are actually the total and count values from the hourly aggregates
+rather than properties from event documents. This will be the case in
+all the higher-level hierarchical aggregations.
+
+Since you're using the same format for map output as was used in the
+hourly aggregations, you can, in fact, use the same reduce and finalize
+functions. The actual Python code driving this level of aggregation is
+as follows:
+
+.. code-block:: python
+
+    cutoff = datetime.utcnow() - timedelta(seconds=60)
+    query = { 'value.ts': { '$gt': last_run, '$lt': cutoff } }
+
+
+    db.stats.hourly.map_reduce(
+        map=mapf_day,
+        reduce=reducef,
+        finalize=finalizef,
+        query=query,
+        out={ 'reduce': 'stats.daily' })
+
+
+    last_run = cutoff
+
+There are a couple of things to note here. First of all, the query is
+not on ``ts`` now, but ``value.ts``, the timestamp written during the
+finalization of the hourly aggregates. Also note that you are, in fact,
+aggregating from the ``stats.hourly`` collection into the ``stats.daily``
+collection.
+
+Index support
+~~~~~~~~~~~~~
+
+Since you're going to be running the initial query on the hourly
+statistics collection frequently, an index on 'value.ts' would be nice
+to have:
+
+.. code-block:: python
+
+    >>> db.stats.hourly.ensure_index('value.ts')
+
+Once again, this is a right-aligned index that will use very little RAM
+for efficient operation.
+
+Other Aggregations
+------------------
+
+Once you have your daily statistics, you can use them to calculate your
+weekly and monthly statistics. The weekly map function is as follows:
+
+.. code-block:: python
+
+    mapf_week = bson.Code('''function() {
+        var key = {
+            u: this._id.u,
+            d: new Date(
+                this._id.d.valueOf()
+                - dt.getDay()*24*60*60*1000) };
+        emit(
+            key,
+            {
+                total: this.value.total,
+                count: this.value.count,
+                mean: 0,
+                ts: null });
+    }''')
+
+Here, in order to get the group key, you simply takes the date and subtracts days
+until you get to the beginning of the week. In the
+weekly map function, you'll use the first day of the month as the
+group key:
+
+.. code-block:: python
+
+    mapf_month = bson.Code('''function() {
+            d: new Date(
+                this._id.d.getFullYear(),
+                this._id.d.getMonth(),
+                1, 0, 0, 0, 0) };
+        emit(
+            key,
+            {
+                total: this.value.total,
+                count: this.value.count,
+                mean: 0,
+                ts: null });
+    }''')
+
+One thing in particular to notice about these map functions is that they
+are identical to one another except for the date calculation. You can use
+Python's string interpolation to refactor the map function definitions
+as follows:
+
+.. code-block:: python
+
+    mapf_hierarchical = '''function() {
+        var key = {
+            u: this._id.u,
+            d: %s };
+        emit(
+            key,
+            {
+                total: this.value.total,
+                count: this.value.count,
+                mean: 0,
+                ts: null });
+    }'''
+
+
+    mapf_day = bson.Code(
+        mapf_hierarchical % '''new Date(
+                this._id.d.getFullYear(),
+                this._id.d.getMonth(),
+                this._id.d.getDate(),
+                0, 0, 0, 0)''')
+
+
+    mapf_week = bson.Code(
+        mapf_hierarchical % '''new Date(
+                this._id.d.valueOf()
+                - dt.getDay()*24*60*60*1000)''')
+
+
+    mapf_month = bson.Code(
+        mapf_hierarchical % '''new Date(
+                this._id.d.getFullYear(),
+                this._id.d.getMonth(),
+                1, 0, 0, 0, 0)''')
+
+
+    mapf_year = bson.Code(
+        mapf_hierarchical % '''new Date(
+                this._id.d.getFullYear(),
+                1, 1, 0, 0, 0, 0)''')
+
+The Python driver can also be refactored so there is much less code
+duplication:
+
+.. code-block:: python
+
+    def aggregate(icollection, ocollection, mapf, cutoff, last_run):
+        query = { 'value.ts': { '$gt': last_run, '$lt': cutoff } }
+        icollection.map_reduce(
+            map=mapf,
+            reduce=reducef,
+            finalize=finalizef,
+            query=query,
+            out={ 'reduce': ocollection.name })
+
+Once this is defined, you can perform all the aggregations as follows:
+
+.. code-block:: python
+
+    cutoff = datetime.utcnow() - timedelta(seconds=60)
+    aggregate(db.events, db.stats.hourly, mapf_hour, cutoff, last_run)
+    aggregate(db.stats.hourly, db.stats.daily, mapf_day, cutoff, last_run)
+    aggregate(db.stats.daily, db.stats.weekly, mapf_week, cutoff, last_run)
+    aggregate(db.stats.daily, db.stats.monthly, mapf_month, cutoff,
+        last_run)
+    aggregate(db.stats.monthly, db.stats.yearly, mapf_year, cutoff,
+        last_run)
+    last_run = cutoff
+
+So long as you save/restore the ``last_run`` variable between
+aggregations, you can run these aggregations as often as you like since
+each aggregation individually is incremental.
+
+Index Support
+~~~~~~~~~~~~~
+
+Your indexes will continue to be on the value's timestamp to ensure
+efficient operation of the next level of the aggregation (and they
+continue to be right-aligned):
+
+.. code-block:: python
+
+    >>> db.stats.daily.ensure_index('value.ts')
+    >>> db.stats.monthly.ensure_index('value.ts')
+
+Sharding
+========
+
+To take advantage of distinct shards when performing map/reduce, your
+input collections should be sharded. In order to achieve good balancing
+between nodes, you should make sure that the shard key is not
+simply the incoming timestamp, but rather something that varies
+significantly in the most recent documents. In this case, the username
+makes sense as the most significant part of the shard key.
+
+In order to prevent a single, active user from creating a large,
+unsplittable chunk, it's best to use a compound shard key with (username,
+timestamp) on the events collection.
+
+.. code-block:: python
+
+    >>> db.command('shardcollection','events', {
+    ... key : { 'userid': 1, 'ts' : 1} } )
+    { "collectionsharded": "events", "ok" : 1 }
+
+In order to take advantage of sharding on
+the aggregate collections, you *must* shard on the ``_id`` field (if you decide
+to shard these collections:)
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'stats.daily')
+    { "collectionsharded" : "stats.daily", "ok": 1 }
+    >>> db.command('shardcollection', 'stats.weekly')
+    { "collectionsharded" : "stats.weekly", "ok" : 1 }
+    >>> db.command('shardcollection', 'stats.monthly')
+    { "collectionsharded" : "stats.monthly", "ok" : 1 }
+    >>> db.command('shardcollection', 'stats.yearly')
+    { "collectionsharded" : "stats.yearly", "ok" : 1 }
+
+You should also update your map/reduce driver so that it notes the output
+should be sharded. This is accomplished by adding 'sharded':True to the
+output argument:
+
+.. code-block:: python
+
+    ... out={ 'reduce': ocollection.name, 'sharded': True })...
+
diff --git a/source/tutorial/usecase/real-time-analytics-preaggregated-reports.txt b/source/tutorial/usecase/real-time-analytics-preaggregated-reports.txt
new file mode 100644
index 00000000000..6f13b9a088b
--- /dev/null
+++ b/source/tutorial/usecase/real-time-analytics-preaggregated-reports.txt
@@ -0,0 +1,543 @@
+===========================================
+Real Time Analytics: Pre-Aggregated Reports
+===========================================
+
+Problem
+=======
+
+You have one or more servers generating events for which you want
+real-time statistical information in a MongoDB collection.
+
+Solution Overview
+=================
+
+This solution assumes the following:
+
+-  There is no need to retain transactional event data in MongoDB, or
+   that retention is handled outside the scope of this use case.
+-  You need statistical data to be up-to-the minute (or up-to-the-second,
+   if possible.)
+-  The queries to retrieve time series of statistical data need to be as
+   fast as possible.
+
+The general approach is to use upserts and increments to generate the
+statistics and simple range-based queries and filters to draw the time
+series charts of the aggregated data.
+
+This use case assumes a simple scenario where you
+want to count the number of hits to a collection of web site at various
+levels of time-granularity (by minute, hour, day, week, and month) as
+well as by path.
+
+Schema Design
+=============
+
+There are two important considerations when designing the schema for a
+real-time analytics system: the ease & speed of updates and the ease &
+speed of queries. In particular, you want to avoid the following
+performance-killing circumstances:
+
+-  documents changing in size significantly, causing reallocations on
+   disk
+-  queries that require large numbers of disk seeks in order to be satisfied
+-  document structures that make accessing a particular field slow
+
+One approach you *could* use to make updates easier would be to keep your
+hit counts in individual documents, one document per
+minute/hour/day/etc. This approach, however, requires you to query
+several documents for nontrivial time range queries, slowing down your
+queries significantly. In order to keep your queries fast, it's actually better
+to use somewhat more complex documents, keeping several aggregate
+values in each document.
+
+In order to illustrate some of the other issues you might encounter, the
+following are several schema designs that you might try as well as discussion of
+the problems each of with them.
+
+Design 0: One Document Per Page/Day
+-----------------------------------
+
+Initially, you might try putting all the statistics you need into a single
+document per page:
+
+.. code-block:: javascript
+
+    {
+        _id: "20101010/site-1/apache_pb.gif",
+        metadata: {
+            date: ISODate("2000-10-10T00:00:00Z"),
+            site: "site-1",
+            page: "/apache_pb.gif" },
+        daily: 5468426,
+        hourly: {
+            "0": 227850,
+            "1": 210231,
+            ...
+            "23": 20457 },
+        minute: {
+            "0": 3612,
+            "1": 3241,
+            ...
+            "1439": 2819 }
+    }
+
+This approach has a couple of advantages:
+
+- It only requires a single update per hit to the website.
+- Intra-day reports for a single page require fetching only a single document.
+
+There are, however, significant
+problems with this approach. The biggest problem is that, as you upsert
+data into the ``hy`` and ``mn`` properties, the document grows. Although
+MongoDB attempts to pad the space required for documents, it will still
+end up needing to reallocate these documents multiple times throughout
+the day, copying the documents to areas with more space and impacting performance.
+
+Design #0.5: Preallocate Documents
+----------------------------------
+
+In order to mitigate the repeated copying of documents, you can tweak your
+approach slightly by adding a process which will preallocate a document
+with initial zeros during the previous day. In order to avoid a
+situation where you preallocate documents *en masse* at midnight, it's best to
+randomly (with a low probability) upsert the next day's document each
+time you update the current day's statistics. This requires some tuning;
+you'd like to have almost all the documents preallocated by the end of
+the day, without spending much time on extraneous upserts (preallocating
+a document that's already there). A reasonable first guess would be to
+look at the average number of hits per day (call it *hits*) and
+preallocate with a probability of *1/hits*.
+
+Preallocating helps performance mainly by ensuring that all the various 'buckets'
+are initialized with 0 hits. Once the document is initialized, then, it
+will never dynamically grow, meaning a) there is no need to perform the
+reallocations that could slow us down in design #0 and b) MongoDB
+doesn't need to pad the records, leading to a more compact
+representation and better usage of your memory.
+
+Design #1: Add Intra-Document Hierarchy
+---------------------------------------
+
+One thing to be aware of with BSON is that documents are stored as a
+sequence of (key, value) pairs, *not* as a hash table. What this means
+for us is that writing to ``stats.mn.0`` is *much* faster than writing to
+``stats.mn.1439``.
+
+.. figure:: img/rta-preagg1.png
+   :align: center
+   :alt: BSON memory layout (unoptimized)
+
+   In order to update the value in minute #1349, MongoDB must skip over all 1349
+   entries before it.
+
+In order to speed this up, you can introduce some intra-document
+hierarchy. In particular, you can split the ``mn`` field up into 24 hourly
+fields:
+
+.. code-block:: javascript
+
+    {
+        _id: "20101010/site-1/apache_pb.gif",
+        metadata: {
+            date: ISODate("2000-10-10T00:00:00Z"),
+            site: "site-1",
+            page: "/apache_pb.gif" },
+        daily: 5468426,
+        hourly: {
+            "0": 227850,
+            "1": 210231,
+            ...
+            "23": 20457 },
+        minute: {
+            "0": {
+                "0": 3612,
+                "1": 3241,
+                ...
+                "59": 2130 },
+            "1": {
+            "60": ... ,
+            },
+            ...
+            "23": {
+                ...
+                "1439": 2819 }
+        }
+    }
+
+This allows MongoDB to "skip forward" when updating the minute
+statistics later in the day, making your performance more uniform and
+generally faster.
+
+.. figure:: img/rta-preagg2.png
+   :align: center
+   :alt: BSON memory layout (optimized)
+
+   To update the value in minute #1349, MongoDB first skips the first 23 hours and
+   then skips 59 minutes for only 82 skips as opposed to 1439 skips in the
+   previous schema.
+
+Design #2: Create separate documents for different granularities
+----------------------------------------------------------------
+
+Design #1 is certainly a reasonable design for storing intraday
+statistics, but what happens when you want to draw a historical chart
+over a month or two? In that case, you need to fetch 30+ individual
+documents containing or daily statistics. A better approach would be to
+store daily statistics in a separate document, aggregated to the month.
+This does introduce a second upsert to the statistics generation side of
+your system, but the reduction in disk seeks on the query side should
+more than make up for it. At this point, the document structure is as
+follows:
+
+Daily Statistics
+~~~~~~~~~~~~~~~~
+
+.. code-block:: javascript
+
+    {
+        _id: "20101010/site-1/apache_pb.gif",
+        metadata: {
+            date: ISODate("2000-10-10T00:00:00Z"),
+            site: "site-1",
+            page: "/apache_pb.gif" },
+        hourly: {
+            "0": 227850,
+            "1": 210231,
+            ...
+            "23": 20457 },
+        minute: {
+            "0": {
+                "0": 3612,
+                "1": 3241,
+                ...
+                "59": 2130 },
+            "1": {
+                "0": ...,
+            },
+            ...
+            "23": {
+                "59": 2819 }
+        }
+    }
+
+Monthly Statistics
+~~~~~~~~~~~~~~~~~~
+
+.. code-block: javascript::
+
+    {
+        _id: "201010/site-1/apache_pb.gif",
+        metadata: {
+            date: ISODate("2000-10-00T00:00:00Z"),
+            site: "site-1",
+            page: "/apache_pb.gif" },
+        daily: {
+            "1": 5445326,
+            "2": 5214121,
+            ... }
+    }
+
+This is actually the schema design that this use case uses, since it allows for a
+good balance between update efficiency and query performance.
+
+Operations
+==========
+
+In this system, you'd like to balance between read performance and write
+(upsert) performance. This section will describe each of the major
+operations you might perform, using the Python programming language and the
+pymongo MongoDB driver. These operations would be similar in other
+languages as well.
+
+Log a Hit to a Page
+-------------------
+
+Logging a hit to a page in your website is the main "write" activity in
+your system. In order to maximize performance, you'll be doing in-place
+updates with the upsert operation:
+
+.. code-block:: python
+
+    from datetime import datetime, time
+
+
+    def log_hit(db, dt_utc, site, page):
+
+
+        # Update daily stats doc
+        id_daily = dt_utc.strftime('%Y%m%d/') + site + page
+        hour = dt_utc.hour
+        minute = dt_utc.minute
+
+
+        # Get a datetime that only includes date info
+        d = datetime.combine(dt_utc.date(), time.min)
+        query = {
+            '_id': id_daily,
+            'metadata': { 'date': d, 'site': site, 'page': page } }
+        update = { '$inc': {
+                'hourly.%d' % (hour,): 1,
+                'minute.%d.%d' % (hour,minute): 1 } }
+        db.stats.daily.update(query, update, upsert=True)
+
+
+        # Update monthly stats document
+        id_monthly = dt_utc.strftime('%Y%m/') + site + page
+        day_of_month = dt_utc.day
+        query = {
+            '_id': id_monthly,
+            'metadata': {
+                'date': d.replace(day=1),
+                'site': site,
+                'page': page } }
+        update = { '$inc': {
+                'daily.%d' % day_of_month: 1} }
+        db.stats.monthly.update(query, update, upsert=True)
+
+Since you're using the upsert operation, this function will perform
+correctly whether the document is already present or not, which is
+important, as your preallocation (the next operation) will only
+preallocate documents with a high probability, not 100% certainty. Note however,
+that without preallocation, you end up with a dynamically growing document,
+slowing down your upserts significantly as documents are moved in order
+to grow them.
+
+Preallocate
+-----------
+
+In order to keep your documents from growing, you can preallocate them
+before they are needed. When preallocating, you set all the statistics to
+zero for all time periods so that later, so that the document doesn't need to
+grow to accomodate the upserts. Here, is this preallocation as its
+own function:
+
+.. code-block:: python
+
+    def preallocate(db, dt_utc, site, page):
+
+
+        # Get id values
+        id_daily = dt_utc.strftime('%Y%m%d/') + site + page
+        id_monthly = dt_utc.strftime('%Y%m/') + site + page
+
+
+        # Get daily metadata
+        daily_metadata = {
+            'date': datetime.combine(dt_utc.date(), time.min),
+            'site': site,
+            'page': page }
+        # Get monthly metadata
+        monthly_metadata = {
+            'date': daily_m['d'].replace(day=1),
+            'site': site,
+            'page': page }
+
+
+        # Initial zeros for statistics
+        hourly = dict((str(i), 0) for i in range(24))
+        minute = dict(
+            (str(i), dict((str(j), 0) for j in range(60)))
+            for i in range(24))
+        daily = dict((str(i), 0) for i in range(1, 32))
+
+
+        # Perform upserts, setting metadata
+        db.stats.daily.update(
+            {
+                '_id': id_daily,
+                'hourly': hourly,
+                'minute': minute},
+            { '$set': { 'metadata': daily_metadata }},
+            upsert=True)
+        db.stats.monthly.update(
+            {
+                '_id': id_monthly,
+                'daily': daily },
+            { '$set': { 'm': monthly_metadata }},
+            upsert=True)
+
+In this case, note that the function went ahead and preallocated the monthly
+document while preallocating the daily document. While you could
+split this into its own function and preallocated monthly documents
+less frequently that daily documents, the performance difference is
+negligible, so the above code is reasonable solution.
+
+The next question you must answer is *when* you should preallocate. You'd
+like to have a high likelihood of the document being preallocated before
+it is needed, but you don't want to preallocate all at once (say at
+midnight) to ensure you don't create a spike in activity and a
+corresponding increase in latency. The solution here is to
+probabilistically preallocate each time a hit is logged, with a probability
+tuned to make preallocation likely without performing too many
+unnecessary calls to preallocate:
+
+.. code-block:: python
+
+    from random import random
+    from datetime import datetime, timedelta, time
+
+
+    # Example probability based on 500k hits per day per page
+    prob_preallocate = 1.0 / 500000
+
+
+    def log_hit(db, dt_utc, site, page):
+        if random.random() < prob_preallocate:
+            preallocate(db, dt_utc + timedelta(days=1), site_page)
+        # Update daily stats doc
+        ...
+
+Now with a high probability, you'll preallocate each document before
+it's used, preventing the midnight spike as well as eliminating the
+movement of dynamically growing documents.
+
+Get Data for a Real-Time Chart
+------------------------------
+
+One chart that you might be interested in seeing would be the number of
+hits to a particular page over the last hour. In that case, the query is
+fairly straightforward:
+
+.. code-block:: python
+
+    >>>``db.stats.daily.find_one(
+    ...     {'metadata': {'date':dt, 'site':'site-1', 'page':'/foo.gif'}},
+    ...     { 'minute': 1 })
+
+Likewise, you can get the number of hits to a page over the last day,
+with hourly granularity:
+
+.. code-block:: python
+
+    >>> db.stats.daily.find_one(
+    ...     {'metadata': {'date':dt, 'site':'site-1', 'page':'/foo.gif'}},
+    ...     { 'hy': 1 })
+
+If you want a few days' worth of hourly data, you can get it using the
+following query:
+
+.. code-block:: python
+
+    >>> db.stats.daily.find(
+    ...     {
+    ...         'metadata.date': { '$gte': dt1, '$lte': dt2 },
+    ...         'metadata.site': 'site-1',
+    ...         'metadata.page': '/foo.gif'},
+    ...     { 'metadata.date': 1, 'hourly': 1 } },
+    ...     sort=[('metadata.date', 1)])
+
+In this case, you're retrieving the date along with the statistics since
+it's possible (though highly unlikely) that you could have a gap of one
+day where a) you didn't happen to preallocate that day and b) there were
+no hits to the document on that day.
+
+Index Support
+~~~~~~~~~~~~~
+
+These operations would benefit significantly from indexes on the
+metadata of the daily statistics:
+
+.. code-block:: python
+
+    >>> db.stats.daily.ensure_index([
+    ...     ('metadata.site', 1),
+    ...     ('metadata.page', 1),
+    ...     ('metadata.date', 1)])
+
+Note in particular that the index is first on site, then page, then date. This
+allows you to perform the third query above (a single page over a range
+of days) quite efficiently. Having any compound index on page and date,
+of course, allows you to look up a single day's statistics efficiently.
+
+Get Data for a Historical Chart
+-------------------------------
+
+In order to retrieve daily data for a single month, you can perform the
+following query:
+
+.. code-block:: python
+
+    >>> db.stats.monthly.find_one(
+    ...     {'metadata':
+    ...         {'date':dt,
+    ...         'site': 'site-1',
+    ...         'page':'/foo.gif'}},
+    ...     { 'daily': 1 })
+
+If you want several months' worth of daily data, of course, you can do the
+same trick as above:
+
+.. code-block:: python
+
+    >>> db.stats.monthly.find(
+    ...     {
+    ...         'metadata.date': { '$gte': dt1, '$lte': dt2 },
+    ...         'metadata.site': 'site-1',
+    ...         'metadata.page': '/foo.gif'},
+    ...     { 'metadata.date': 1, 'hourly': 1 } },
+    ...     sort=[('metadata.date', 1)])
+
+Index support
+~~~~~~~~~~~~~
+
+Once again, these operations would benefit significantly from indexes on
+the metadata of the monthly statistics:
+
+.. code-block:: python
+
+    >>> db.stats.monthly.ensure_index([
+    ...     ('metadata.site', 1),
+    ...     ('metadata.page', 1),
+    ...     ('metadata.date', 1)])
+
+The order of the index is once again designed to efficiently support
+range queries for a single page over several months, as above.
+
+Sharding
+========
+
+The performance of this system will be limited by the number of shards
+in your cluster as well as the choice of your shard key. The ideal shard
+key balances upserts beteen the shards evenly while routing any
+individual query to a single shard (or a small number of shards). A
+reasonable shard key for this solution would thus be (``metadata.site``,
+``metadata.page``), the site-page combination for which you're calculating
+statistics:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'stats.daily', {
+    ...     key : { 'metadata.site': 1, 'metadata.page' : 1 } })
+    { "collectionsharded" : "stats.daily", "ok" : 1 }
+    >>> db.command('shardcollection', 'stats.monthly', {
+    ...     key : { 'metadata.site': 1, 'metadata.page' : 1 } })
+    { "collectionsharded" : "stats.monthly", "ok" : 1 }
+
+One downside to using (``metadata.site``,``metadata.page``) as your shard
+key is that, if one page dominates all your traffic, all updates to that
+page will go to a single shard. The problem, however, is largely
+unavoidable, since all update for a single page are going to a single
+*document*.
+
+You also have the problem using only (``metadata.site``, ``metadata.page``)
+shard key that, if a high percentage of your queries go to the same page,
+these will all be handled by the same shard. A (slightly) better shard
+key would the include the date as well as the site/page so that you could
+serve different historical ranges with different shards:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'stats.daily', {
+    ...     key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}})
+    { "collectionsharded" : "stats.daily", "ok" : 1 }
+    >>> db.command('shardcollection', 'stats.monthly', {
+    ...     key:{'metadata.site':1,'metadata.page':1,'metadata.date':1}})
+    { "collectionsharded" : "stats.monthly", "ok" : 1 }
+
+It is worth noting in this discussion of sharding that, depending on the
+number of sites/pages you are tracking and the number of hits per page,
+you're talking about a fairly small set of data with modest performance
+requirements, so sharding may be overkill. In the case of the MongoDB
+Monitoring Service (MMS), a single shard is able to keep up with the
+totality of traffic generated by all the customers using this (free)
+service.
diff --git a/source/tutorial/usecase/real-time-analytics-storing-log-data.txt b/source/tutorial/usecase/real-time-analytics-storing-log-data.txt
new file mode 100644
index 00000000000..9e24872de2c
--- /dev/null
+++ b/source/tutorial/usecase/real-time-analytics-storing-log-data.txt
@@ -0,0 +1,587 @@
+=====================================
+Real Time Analytics: Storing Log Data
+=====================================
+
+Problem
+=======
+
+You have one or more servers generating events that you would like to
+persist to a MongoDB collection.
+
+Solution Overview
+=================
+
+This solution will assume that each server generating events, as well as the
+consumer of the event data, has access to the MongoDB server(s). Furthermore,
+this design will optimize based on the assumption that the query
+rate is (substantially) lower than the insert rate (as is most often the
+case when logging a high-bandwidth event stream).
+
+Schema Design
+=============
+
+The schema design in this case will depend largely on the particular
+format of the event data you want to store. For a simple example, let's
+take standard request logs from the Apache web server using the combined
+log format. This example assumes you're using an uncapped
+collection to store the event data. A line from such a log file might
+look like the following:
+
+.. code-block:: text
+
+    127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)"
+
+The simplest approach to storing the log data would be putting the exact
+text of the log record into a document:
+
+.. code-block:: javascript
+
+    {
+    _id: ObjectId('4f442120eb03305789000000'),
+            line: '127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)"'
+    }
+
+While this is a possible solution, it's not likely to be the optimal
+solution. For instance, if you decided you wanted to find events that hit
+the same page, you'd need to use a regular expression query, which
+would require a full collection scan. A better approach would be to
+extract the relevant fields into individual properties. When doing the
+extraction, it's important to pay attention to the choice of data types for the
+various fields. For instance, the date field in the log line
+``[10/Oct/2000:13:55:36 -0700]`` is 28 bytes long. If you instead store
+this as a UTC timestamp, it shrinks to 8 bytes. Storing the date as a
+timestamp also allows you to make date range
+queries, whereas comparing two date *strings* is nearly useless. A
+similar argument applies to numeric fields; storing them as strings is
+suboptimal, taking up more space and making the appropriate types of
+queries much more difficult.
+
+It's also important to consider what information you might want to omit from the
+log record. For instance, if you wanted to record exactly what was in the
+log record, you might create a document like the following:
+
+.. code-block:: javascript
+
+    {
+         _id: ObjectId('4f442120eb03305789000000'),
+         host: "127.0.0.1",
+         logname: null,
+         user: 'frank',
+         time:  ,
+         request: "GET /apache_pb.gif HTTP/1.0",
+         status: 200,
+         response_size: 2326,
+         referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
+         user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
+    }
+
+
+In most cases, however, you're probably are only interested in a subset of
+the data about the request. Here, you may want to keep the host, time,
+path, user agent, and referer for a web analytics application:
+
+.. code-block:: javascript
+
+    {
+         _id: ObjectId('4f442120eb03305789000000'),
+         host: "127.0.0.1",
+         time:  ISODate("2000-10-10T20:55:36Z"),
+         path: "/apache_pb.gif",
+         referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
+         user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
+    }
+
+It might even be possible to remove the time, since ``ObjectId``\ s embed
+the time they are created:
+
+.. code-block:: javascript
+
+    {
+         _id: ObjectId('4f442120eb03305789000000'),
+         host: "127.0.0.1",
+         time:  ISODate("2000-10-10T20:55:36Z"),
+         path: "/apache_pb.gif",
+         referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
+         user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
+    }
+
+System Architecture
+-------------------
+
+Event logging systems are mainly concerned with two
+performance considerations: 1) how many inserts per second can it
+perform (this will limit its event throughput) and 2) how will the system manage
+the growth of event data. Concerning insert performance, the best way to
+scale the architecture is via sharding.
+
+Operations
+==========
+
+The main performance-critical operation in storing
+an event log is the insertion speed. However, you also need to be able to
+query the event data for relevant statistics. This section will describe
+each of these operations, using the Python programming language and the
+``pymongo`` MongoDB driver. These operations would be similar in other
+languages as well.
+
+Inserting a Log Record
+----------------------
+
+In many event logging applications, you might accept some degree of risk
+when it comes to dropping events. In others, you need to be absolutely
+sure you don't drop any events. MongoDB supports both models. In the case
+where you can tolerate a risk of loss, you can insert records
+*asynchronously* using a fire-and-forget model:
+
+.. code-block:: python
+
+    >>> import bson
+    >>> import pymongo
+    >>> from datetime import datetime
+    >>> conn = pymongo.Connection()
+    >>> db = conn.event_db
+    >>> event = {
+    ...     _id: bson.ObjectId(),
+    ...     host: "127.0.0.1",
+    ...     time:  datetime(2000,10,10,20,55,36),
+    ...     path: "/apache_pb.gif",
+    ...     referer: "[http://www.example.com/start.html](http://www.example.com/start.html)",
+    ...     user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
+    ...}
+    >>> db.events.insert(event, safe=False)
+
+This is the fastest approach, as this code doesn't even require a
+round-trip to the MongoDB server to ensure that the insert was received.
+It is thus also the riskiest approach, as network and server failures (such as
+DuplicateKeyErrors on a unique index) will go undetected.  If you want to make
+sure you have an acknowledgement from the
+server that your insertion succeeded (for some definition of success), you
+can pass safe=True:
+
+.. code-block:: python
+
+    >>> db.events.insert(event, safe=True)
+
+If your tolerance for data loss risk is somewhat less, you can require
+that the server to which you write the data has committed the event to
+the on-disk journal before you continue operation (``safe=True`` is
+implied by all the following options:)
+
+.. code-block:: python
+
+    >>> db.events.insert(event, j=True)
+
+Finally, if you have *extremely low* tolerance for event data loss, you
+can require the data to be replicated to multiple secondary servers
+before returning:
+
+.. code-block:: python
+
+    >>> db.events.insert(event, w=2)
+
+In this case, you will get acknowledgement that the data has been
+replicated to 2 replicas. You can combine options as well:
+
+.. code-block:: python
+
+    >>> db.events.insert(event, j=True, w=2)
+
+In this case, the insert waits on both a journal commit *and* a
+replication acknowledgement. Although this is the safest option, it is
+also the slowest, so you should be aware of the trade-off when
+performing your inserts.
+
+.. note::
+
+    If at all possible in your application architecture, you should consider
+    using bulk inserts to insert event data. All the options discussed above
+    apply to bulk inserts, but you can actually pass multiple events as the
+    first parameter to .insert(). By passing multiple documents into a
+    single insert() call, MongoDB are able to amortize the performance penalty you
+    incur by using the 'safe' options such as j=True or w=2.
+
+Finding All the Events for a Particular Page
+--------------------------------------------
+
+For a web analytics-type operation, getting the logs for a particular
+web page might be a common operation for which  you would want to optimize.
+In this case, the query would be as follows:
+
+.. code-block:: python
+
+    >>> q_events = db.events.find({'path': '/apache_pb.gif'})
+
+Note that the sharding setup you use (should you decide to shard this
+collection) has performance implications for this operation. For
+instance, if you shard on the 'path' property, then this query will be
+handled by a single shard, whereas if you shard on some other property or
+combination of properties, the mongos instance will be forced to do a
+scatter/gather operation which involves *all* the shards.
+
+Index Support
+~~~~~~~~~~~~~
+
+This operation would benefit significantly from an index on the 'path'
+attribute:
+
+.. code-block:: python
+
+    >>> db.events.ensure_index('path')
+
+One potential downside to this index is that it is relatively randomly
+distributed, meaning that for efficient operation the entire index
+should be resident in RAM. Since there is likely to be a relatively
+small number of distinct paths in the index, however, this will probably
+not be a problem.
+
+Finding All the Events for a Particular Date
+--------------------------------------------
+
+You may also want to find all the events for a particular date. In this
+case, you'd perform the following query:
+
+.. code-block:: python
+
+    >>> q_events = db.events.find('time':
+    ...     { '$gte':datetime(2000,10,10),'$lt':datetime(2000,10,11)})
+
+Index Support
+~~~~~~~~~~~~~
+
+In this case, an index on 'time' would provide optimal performance:
+
+.. code-block:: python
+
+    >>> db.events.ensure_index('time')
+
+One of the nice things about this index is that it is *right-aligned*.
+Since you are always inserting events in ascending time order, the
+right-most slice of the B-tree will always be resident in RAM. So long
+as your queries focus mainly on recent events, the *only* part of the
+index that needs to be resident in RAM is the right-most slice of the
+B-tree, allowing MongoDB to keep quite a large index without using up much of
+the system memory.
+
+Finding All the Events for a Particular Host/Date
+-------------------------------------------------
+
+You might also want to analyze the behavior of a particular host on a
+particular day, perhaps for analyzing suspicious behavior by a
+particular IP address. In that case, you'd write a query such as:
+
+.. code-block:: python
+
+    >>> q_events = db.events.find({
+    ...     'host': '127.0.0.1',
+    ...     'time': {'$gte':datetime(2000,10,10),'$lt':datetime(2000,10,11)}
+    ... })
+
+Index Support
+~~~~~~~~~~~~~
+
+Once again, your choice of indexes affects the performance
+characteristics of this query significantly. For instance, suppose you
+create a compound index on (time, host):
+
+.. code-block:: python
+
+    >>> db.events.ensure_index([('time', 1), ('host', 1)])
+
+In this case, the query plan would be the following (retrieved via
+``q_events.explain()``):
+
+.. code-block: python
+
+    { ...
+      u'cursor': u'BtreeCursor time_1_host_1',
+      u'indexBounds': {u'host': [[u'127.0.0.1', u'127.0.0.1']],
+      u'time': [
+          [ datetime.datetime(2000, 10, 10, 0, 0),
+            datetime.datetime(2000, 10, 11, 0, 0)]]
+      },
+      ...
+      u'millis': 4,
+      u'n': 11,
+      u'nscanned': 1296,
+      u'nscannedObjects': 11,
+      ... }
+
+If, however, you create a compound index on (host, time)...
+
+.. code-block:: python
+
+    >>> db.events.ensure_index([('host', 1), ('time', 1)])
+
+then get a much more efficient query plan and much better performance:
+
+.. code-block:: python
+
+    { ...
+      u'cursor': u'BtreeCursor host_1_time_1',
+      u'indexBounds': {u'host': [[u'127.0.0.1', u'127.0.0.1']],
+      u'time': [[datetime.datetime(2000, 10, 10, 0, 0),
+          datetime.datetime(2000, 10, 11, 0, 0)]]},
+      ...
+      u'millis': 0,
+      u'n': 11,
+      ...
+      u'nscanned': 11,
+      u'nscannedObjects': 11,
+      ...
+    }
+
+In this case, MongoDB is able to visit just 11 entries in the index to
+satisfy the query, whereas in the first it needed to visit 1296 entries.
+This is because the query using (host, time) needs to search the index
+range from ``('127.0.0.1', datetime(2000,10,10))`` to
+``('127.0.0.1', datetime(2000,10,11))`` to satisfy the above query, whereas if you
+used (time, host), the index range would be ``(datetime(2000,10,10), MIN_KEY)``
+to ``(datetime(2000,10,10), MAX_KEY)``, a much larger range (in this case,
+1296 entries) which will yield a correspondingly slower performance.
+
+Although the index order has an impact on the performance of the query,
+one thing to keep in mind is that an index scan is still *much* faster than a
+collection scan. So using a (time, host) index would still be much
+faster than an index on (time) alone. There is also the issue of
+right-alignedness to consider, as the (time, host) index will be
+right-aligned but the (host, time) index will not, and it's possible
+that the right-alignedness of a (time, host) index will make up for the
+increased number of index entries that need to be visited to satisfy
+this query.
+
+Counting the Number of Requests by Day and Page
+-----------------------------------------------
+
+MongoDB 2.1 introduced a new aggregation framework that allows you to
+perform queries that aggregate large numbers of documents significantly
+faster than the old 'mapreduce' and 'group' commands in prior versions
+of MongoDB. Suppose you'd like to find out how many requests there were for
+each day and page over the last month, for instance. In this case, you
+could build up the following aggregation pipeline:
+
+.. code-block:: python
+
+    >>> result = db.command('aggregate', 'events', pipeline=[
+    ...         {  '$match': {
+    ...               'time': {
+    ...                   '$gte': datetime(2000,10,1),
+    ...                   '$lt':  datetime(2000,11,1) } } },
+    ...         {  '$project': {
+    ...                 'path': 1,
+    ...                 'date': {
+    ...                     'y': { '$year': '$time' },
+    ...                     'm': { '$month': '$time' },
+    ...                     'd': { '$dayOfMonth': '$time' } } } },
+    ...         { '$group': {
+    ...                 '_id': {
+    ...                     'p':'$path',
+    ...                     'y': '$date.y',
+    ...                     'm': '$date.m',
+    ...                     'd': '$date.d' },
+    ...                 'hits': { '$sum': 1 } } },
+    ...         ])
+
+The performance of this aggregation is dependent, of course, on your
+choice of shard key if we're sharding. What you'd like to ensure is that
+all the items in a particular 'group' are on the same server, which you
+can do by sharding on date (probably not wise, as discussed below) or
+path (possibly a good idea).
+
+Index Support
+~~~~~~~~~~~~~
+
+In this case, you want to make sure you have an index on the initial
+$match query:
+
+.. code-block:: python
+
+    >>> db.events.ensure_index('time')
+
+If you already have an index on (time, host) as discussed above,
+however, there is no need to create a separate index on 'time' alone,
+since the (time, host) index can be used to satisfy range queries on
+'time' alone.
+
+Sharding
+========
+
+Your insertion rate is going to be limited by the number of shards you
+maintain in your cluster as well as by the choice of a shard key. The
+choice of a shard key is important because MongoDB uses *range-based
+sharding* . What you *want* to happen is for the insertions to be
+balanced equally among the shards, so you'd like to avoid using something
+like a timestamp, sequence number, or ``ObjectId`` as a shard key, as new
+inserts would tend to cluster around the same values (and thus the same
+shard). But what you also *want* to happen is for each of your queries to
+be routed to a single shard. The following are the pros and cons of each
+approach.
+
+Option 0: Shard on Time
+-----------------------
+
+Although an ``ObjectId`` or timestamp might seem to be an attractive
+sharding key at first, particularly given the right-alignedness of the
+index, it turns out to provide the worst of all worlds when it comes to
+read and write performance. In this case, all of your inserts will always
+flow to the same shard, providing no performance benefit write-side from
+sharding. Your reads will also tend to cluster in the same shard, so you
+would get no performance benefit read-side either.
+
+Option 1: Shard On a Random(ish) Key
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Suppose instead that you decided to shard on a key with a random
+distribution, say the md5 or sha1 hash of the ``_id`` field:
+
+.. code-block:: python
+
+    >>> from bson import Binary
+    >>> from hashlib import sha1
+    >>>
+    >>> # Introduce the synthetic shard key (this should actually be done at
+    >>> #     event insertion time)
+    >>>
+    >>> for ev in db.events.find({}, {'_id':1}):
+    ...     ssk = Binary(sha1(str(ev._id))).digest())
+    ...     db.events.update({'_id':ev['_id']}, {'$set': {'ssk': ssk} })
+    ...
+    >>> db.command('shardcollection', 'events', {
+    ...     key : { 'ssk' : 1 } })
+    { "collectionsharded" : "events", "ok" : 1 }
+
+This does introduce some complexity into your application in order to
+generate the random key, but it provides you linear scaling on your
+inserts, so 5 shards should yield a 5x speedup in inserting. The
+downsides to using a random shard key are the following: a) the shard
+key's index will tend to take up more space (and you need an index to
+determine where to place each new insert), and b) queries (unless they
+include the synthetic, random-ish shard key) will need to be distributed
+to all your shards in parallel. This may be acceptable, since in this
+scenario write performance is much more important than  read
+performance, but you should be aware of the downsides to using a random
+key distribution.
+
+Option 2: Shard On a Naturally Evenly-Distributed Key
+-----------------------------------------------------
+
+In this case, you might choose to shard on the 'path' attribute, since it
+seems to be relatively evenly distributed:
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'events', {
+    ...     key : { 'path' : 1 } })
+    { "collectionsharded" : "events", "ok" : 1 }
+
+This has a couple of advantages: a) writes tend to be evenly balanced,
+and b) reads tend to be selective (assuming they include the 'path'
+attribute in the query). There is a potential downside to this approach,
+however, particularly in the case where there are a limited number of
+distinct values for the path. In that case, you can end up with large
+shard 'chunks' that cannot be split or rebalanced because they contain
+only a single shard key. The rule of thumb here is that you should not
+pick a shard key which allows large numbers of documents to have the
+same shard key since this prevents rebalancing.
+
+Option 3: Combine a Natural and Synthetic Key
+---------------------------------------------
+
+This approach is perhaps the best combination of read and write
+performance for the application. You can define the shard key to be
+(path, sha1(\_id)):
+
+.. code-block:: python
+
+    >>> db.command('shardcollection', 'events', {
+    ...     key : { 'path' : 1, 'ssk': 1 } })
+    { "collectionsharded" : "events", "ok" : 1 }
+
+You still need to calculate a synthetic key in the application client,
+but in return you get good write balancing as well as good read
+selectivity.
+
+Sharding Conclusion: Test With Your Own Data
+--------------------------------------------
+
+Picking a good shard key is unfortunately still one of those decisions
+that is simultaneously difficult to make, high-impact, and difficult to
+change once made. The particular mix of reading and writing, as well as
+the particular queries used, all have a large impact on the performance
+of different sharding configurations. Although you can choose a
+reasonable shard key based on the considerations above, the best
+approach is to analyze the actual insertions and queries you are using
+in your own application.
+
+Variation: Capped Collections
+=============================
+
+One variation that you may want to consider based on your data retention
+requirements is whether you might be able to use a `capped
+collection <http://www.mongodb.org/display/DOCS/Capped+Collections>`_ to
+store your events. Capped collections might be a good choice if you know
+you will process the event documents in a timely manner and you don't
+have exacting data retention requirements on the event data. Capped
+collections have the advantage of never growing beyond a certain size
+(they are allocated as a circular buffer on the disk) and having
+documents 'fall out' of the buffer in their insertion order. Uncapped
+collections (the default) will persist documents until they are
+explicitly removed from the collection or the collection is dropped.
+
+Appendix: Managing Event Data Growth
+====================================
+
+MongoDB databases, in the course of normal operation, never relinquish
+disk space back to the file system. This can create difficulties if you
+don't manage the size of your databases up front. For event data, you
+have a few options for managing the data growth:
+
+Single Collection
+-----------------
+
+This is the simplest option: keep all events in a single collection,
+periodically removing documents that you don't need any more. The
+advantage of simplicity, however, is offset by some performance
+considerations. First, when you execute your remove, MongoDB will actually
+bring the documents being removed into memory. Since these are documents
+that presumably you haven't touched in a while (that's why you're deleting
+them), this will force more relevant data to be flushed out to disk.
+Second, in order to do a reasonably fast remove operation, you probably
+want to keep an index on a timestamp field. This will tend to slow down
+your inserts, as the inserts have to update the index as well as write
+the event data. Finally, removing data periodically will also be the
+option that has the most potential for fragmenting the database, as
+MongoDB attempts to reuse the space freed by the remove operations for
+new events.
+
+Multiple Collections, Single Database
+-------------------------------------
+
+The next option is to periodically *rename* your event collection,
+rotating collections in much the same way you might rotate log files. You
+would then drop the oldest collection from the database. This has
+several advantages over the single collection approach. First off,
+collection renames are both fast and atomic. Secondly, you don't actually
+have to touch any of the documents to drop a collection. Finally, since
+MongoDB allocates storage in *extents* that are owned by collections,
+dropping a collection will free up entire extents, mitigating the
+fragmentation risk. The downside to using multiple collections is
+increased complexity, since you will probably need to query both the
+current event collection and the previous event collection for any data
+analysis you perform.
+
+Multiple Databases
+------------------
+
+In the multiple database option, you take the multiple collection option
+a step further. Now, rather than rotating your collections, you will
+rotate your databases. At the cost of rather increased complexity both in
+insertions and queries, you do gain one benefit: as your databases get
+dropped, disk space gets returned to the operating system. This option
+would only really make sense if you had extremely variable event
+insertion rates or if you had variable data retention requirements. For
+instance, if you are performing a large backfill of event data and want
+to make sure that the entire set of event data for 90 days is available
+during the backfill, but can be reduced to 30 days in ongoing
+operations, you might consider using multiple databases. The complexity
+cost for multiple databases, however, is significant, so this option
+should only be taken after thorough analysis.