Skip to content

First batch of use cases #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 26, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
453 changes: 453 additions & 0 deletions source/tutorial/usecase/cms-metadata-and-asset-management.txt

Large diffs are not rendered by default.

600 changes: 600 additions & 0 deletions source/tutorial/usecase/cms-storing-comments.txt

Large diffs are not rendered by default.

249 changes: 249 additions & 0 deletions source/tutorial/usecase/ecommerce-category-hierarchy.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
==============================
E-Commerce: Category Hierarchy
==============================

Problem
=======

You have a product hierarchy for an e-commerce site that you want to
query frequently and update somewhat frequently.

Solution Overview
=================

This solution keeps each category in its own document, along with a list of its
ancestors. The category hierarchy used in this example will be
based on different categories of music:

.. figure:: img/ecommerce-category1.png
:align: center
:alt: Initial category hierarchy

Initial category hierarchy

Since categories change relatively infrequently, the focus here will be on the
operations needed to keep the hierarchy up-to-date and less on the performance
aspects of updating the hierarchy.

Schema Design
=============

Each category in the hierarchy will be represented by a document. That
document will be identified by an ``ObjectId`` for internal
cross-referencing as well as a human-readable name and a url-friendly
``slug`` property. Additionally, the schema stores an ancestors list along
with each document to facilitate displaying a category along with all
its ancestors in a single query.

.. code-block:: javascript

{ "_id" : ObjectId("4f5ec858eb03303a11000002"),
"name" : "Modal Jazz",
"parent" : ObjectId("4f5ec858eb03303a11000001"),
"slug" : "modal-jazz",
"ancestors" : [
{ "_id" : ObjectId("4f5ec858eb03303a11000001"),
"slug" : "bop",
"name" : "Bop" },
{ "_id" : ObjectId("4f5ec858eb03303a11000000"),
"slug" : "ragtime",
"name" : "Ragtime" } ]
}

Operations
==========

Here, the various category manipulations you may need in an ecommerce site are
described as they would occur using the schema above. The examples use the Python
programming language and the ``pymongo`` MongoDB driver, but implementations
would be similar in other languages as well.

Read and Display a Category
---------------------------

The simplest operation is reading and displaying a hierarchy. In this
case, you might want to display a category along with a list of "bread
crumbs" leading back up the hierarchy. In an E-commerce site, you'll
most likely have the slug of the category available for your query, as it can be
parsed from the URL.

.. code-block:: python

category = db.categories.find(
{'slug':slug},
{'_id':0, 'name':1, 'ancestors.slug':1, 'ancestors.name':1 })

Here, the slug is used to retrieve the category, fetching only those
fields needed for display.

Index Support
~~~~~~~~~~~~~

In order to support this common operation efficiently, you'll need an index
on the 'slug' field. Since slug is also intended to be unique, the index over it
should be unique as well:

.. code-block:: python

db.categories.ensure_index('slug', unique=True)

Add a Category to the Hierarchy
-------------------------------

Adding a category to a hierarchy is relatively simple. Suppose you wish
to add a new category 'Swing' as a child of 'Ragtime':

.. figure:: img/ecommerce-category2.png
:align: center
:alt: Adding a category

Adding a category

In this case, the initial insert is simple enough, but after this
insert, the "Swing" category is still missing its ancestors array. To define
this, you'll need a helper function to build the ancestor list:

.. code-block:: python

def build_ancestors(_id, parent_id):
parent = db.categories.find_one(
{'_id': parent_id},
{'name': 1, 'slug': 1, 'ancestors':1})
parent_ancestors = parent.pop('ancestors')
ancestors = [ parent ] + parent_ancestors
db.categories.update(
{'_id': _id},
{'$set': { 'ancestors': ancestors } })

Note that you only need to travel one level in the hierarchy to get the
ragtime's ancestors and build swing's entire ancestor list. Now you can
actually perform the insert and rebuild the ancestor list:

.. code-block:: python

doc = dict(name='Swing', slug='swing', parent=ragtime_id)
swing_id = db.categories.insert(doc)
build_ancestors(swing_id, ragtime_id)

Index Support
~~~~~~~~~~~~~

Since these queries and updates all selected based on ``_id``, you only need
the default MongoDB-supplied index on ``_id`` to support this operation
efficiently.

Change the Ancestry of a Category
---------------------------------

Suppose you wish to reorganize the hierarchy by moving 'bop' under
'swing':

.. figure:: img/ecommerce-category3.png
:align: center
:alt: Change the parent of a category

Change the parent of a category

The initial update is straightforward:

.. code-block:: python

db.categories.update(
{'_id':bop_id}, {'$set': { 'parent': swing_id } } )

Now, you still need to update the ancestor list for bop and all its
descendants. In this case, you can't guarantee that the ancestor list of
the parent category is always correct, since MongoDB may
process the categories out-of-order. To handle this, you'll need a new
ancestor-building function:

.. code-block:: python

def build_ancestors_full(_id, parent_id):
ancestors = []
while parent_id is not None:
parent = db.categories.find_one(
{'_id': parent_id},
{'parent': 1, 'name': 1, 'slug': 1, 'ancestors':1})
parent_id = parent.pop('parent')
ancestors.append(parent)
db.categories.update(
{'_id': _id},
{'$set': { 'ancestors': ancestors } })

Now, at the expense of a few more queries up the hierarchy, you can
easily reconstruct all the descendants of 'bop':

.. code-block:: python

for cat in db.categories.find(
{'ancestors._id': bop_id},
{'parent_id': 1}):
build_ancestors_full(cat['_id'], cat['parent_id'])

Index Support
~~~~~~~~~~~~~

In this case, an index on ``ancestors._id`` would be helpful in
determining which descendants need to be updated:

.. code-block:: python

db.categories.ensure_index('ancestors._id')

Rename a Category
-----------------

Renaming a category would normally be an extremely quick operation, but
in this case due to denormalization, you also need to update the
descendants. Suppose you need to rename "Bop" to "BeBop:"

.. figure:: img/ecommerce-category4.png
:align: center
:alt: Rename a category

Rename a category

First, you need to update the category name itself:

.. code-block:: python

db.categories.update(
{'_id':bop_id}, {'$set': { 'name': 'BeBop' } } )

Next, you need to update each descendant's ancestors list:

.. code-block:: python

db.categories.update(
{'ancestors._id': bop_id},
{'$set': { 'ancestors.$.name': 'BeBop' } },
multi=True)

Here, you can use the positional operation ``$`` to match the exact "ancestor"
entry that matches the query, as well as the ``multi`` option on the
update to ensure the rename operation occurs in a single server
round-trip.

Index Support
~~~~~~~~~~~~~

In this case, the index you have already defined on ``ancestors._id`` is
sufficient to ensure good performance.

Sharding
========

In this solution, it is unlikely that you would want to shard the
collection since it's likely to be quite small. If you *should* decide to
shard, the use of an ``_id`` field for most updates makes it an
ideal sharding candidate. The sharding commands you'd use to shard
the category collection would then be the following:

.. code-block:: python

>>> db.command('shardcollection', 'categories')
{ "collectionsharded" : "categories", "ok" : 1 }

Note that there is no need to specify the shard key, as MongoDB will
default to using ``_id`` as a shard key.
Loading