Skip to content

FAQs on storage #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 21, 2012
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 49 additions & 37 deletions draft/faq/storage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FAQ: MongoDB Storage

.. default-domain:: mongodb

This document addresses common question regarding MongoDB's storage
This document addresses common questions regarding MongoDB's storage
system.

If you don't find the answer you're looking for, check
Expand All @@ -15,47 +15,30 @@ the :doc:`complete list of FAQs </faq>` or post your question to the
:backlinks: none
:local:

What are Memory Mapped Files?
What are memory mapped files?
-----------------------------

Memory mapped files are ____
Memory mapped files are a way of keeping files and data
up to date in memory using the system call ``mmap()``. MongoDB uses
memory mapped files as its storage engine. By using memory mapped
files MongoDB can treat the content of its data files as if they were
in memory. This provides MongoDB with an extremely fast and simple
method for accessing and manipulating data.

What is the working set?
------------------------

The working set is ____

A common misconception in using MongoDB is that the working set can be
reduced to a discrete value. It's important to understand that the
working set is simply a way of thinking about the data one is
accessing and that which MongoDB is working with frequently.

For instance, if you are running a map/reduce job in which the job
reads every document, then your working set is every document.
Conversely, if your map/reduce job only reads the most recent 100
documents, then the working set will be those 100 documents.
How do memory mapped files work?
--------------------------------

How does memory-mapped file access work?
----------------------------------------
Memory mapping assigns files to a block of virtual memory with a
direct byte-for-byte correlation. Once mapped, the relationship
between file and memory allows MongoDB to interact with the data in
the file as if it were memory.

MongoDB uses memory-mapped files for its data file management.
How does MongoDB work with memory mapped files?
-----------------------------------------------

When MongoDB memory-maps the data files (for, say, a map/reduce
query), you're letting the OS know you'd like the contents of the
files available as if they were in some portion of memory.

This doesn't necessarily mean it's in memory already-- when you go to
access any point, the OS checks if this 'page' is in physical ram or
not.

If it is, it returns whatever's in memory in that location. If
it's not, then it will fetch that portion of the file, make sure it's
in memory, and then return it to you.

Writing works in the same fashion-- MongoDB tries to write to a memory
page. If it's in RAM, then it works quickly (just swapping some bits
in the memory). The page will then be marked as 'dirty' and the OS
will take care of flushing it back to disk, persisting your changes.
MongoDB uses memory mapped files for managing and interacting with all
data. MongoDB memory maps data files to memory as it accesses
documents. Data that isn't accessed is *not* mapped to memory.

What are page faults?
---------------------
Expand All @@ -64,12 +47,41 @@ Page faults will occur if you're attempting to access some part of a
memory-mapped file that *isn't* in memory.

This could potentially force the OS to find some not-recently-used
page in physical RAM, get rid of it (maaybe write it back to disk if
page in physical RAM, get rid of it (maybe write it back to disk if
it's changed since it loaded), go back to disk, read the page, and
load it into RAM...an expensive task, overall.

What is the difference between soft and hard page faults?
---------------------------------------------------------

:term:`Page faults <page fault>` occur when MongoDB needs access to
data that isn't currently in active memory. A "hard" page fault
refers to situations when MongoDB must access a disk to access the
data. A "soft" page fault, by contrast, merely moves memory pages from
one list to another, and does not require as much time to complete.

What tools can I use to investigate storage use in MongoDB?
-----------------------------------------------------------

The :func:`db.stats()` function in the :program:`mongo` shell, will
output the current state of the "active" database. The
:doc:`/reference/database-statistics` document outlines the meaning of
the fields output by :func:`db.stats()`.

What is the working set?
------------------------

Working set represents the total body of data that the application
uses in the course of normal operation. Often this is a subset of the
total data size, but the specific size of the working set depends on
actual moment-to-moment use of the database.

If you run a query that requires MongoDB to scan every document in a collection, the working set will expand to include every document. Depending on physical memory size, this may cause working set documents in other collections to be ejected from physical memory. The next time such documents are accessed, a hard page fault may be incurred.

If you run a query that requires MongoDB to scan every
:term:`document` in a collection, the working set includes every active
document in memory.

For best performance, the majority of your *active* set should fit in
RAM.