From af0e16c1a26122e2c686f97a195d972c546c31e8 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 18 Dec 2012 20:37:44 -0500 Subject: [PATCH 1/6] DOCS-693 excessive disk space --- source/faq/storage.txt | 145 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 145 insertions(+) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index d1b4b068133..4f0f5d2b51e 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -101,3 +101,148 @@ active document in memory. For best performance, the majority of your *active* set should fit in RAM. + +Why are the files in my data directory larger than the data in my database? +--------------------------------------------------------------------------- + +The data files in your data directory, which is the ``/data/db`` +directory in default configurations, might be larger than the data set +inserted into the database. This is caused by pre-allocated files and by +empty blocks, as explained here. + +- Preallocated data files + + In the data directory, MongoDB preallocates data files to a particular + size, in part to prevent file system fragmentation. The first filename + for a data file is ``.0``, the next + ``.1``, etc. The first file is preallocated at 64 + megabytes, the next 128 megabytes, and so on, up to 2 gigabytes, at + which point all subsequent files are 2 gigabytes. The data files, + therefore, contain files for which space is allocated but no data yet + exists. A file might preallocated 1 gigabyte but be 90% empty. For + databases of hundreds of gigabytes, unallocated space is small + compared to the database and is insignificant. + + On UNIX, :program:`mongod` preallocates an additional data file + prefilled with zero bytes. Pre-filling in the background prevents + significant delays when a new database file is next allocated. + + You can disable preallocation with the :option:`--nopreallo ` command line option. Do not use this option in production + environments. This option is intended for tests with small data sets + where you drop the database after each test. + + On Linux systems you can use ``hdparam`` to get an idea of how costly + allocation might be: + + .. code-block:: sh + + time hdparm --fallocate $((1024*1024)) testfile + +- The :term:`oplog` + + If replication is enabled, the data directory includes the + :term:`oplog.rs ` file, which is a preallocated :term:`capped + collection` in the ``local`` database. The default allocation is + approximately 5% of disk space on a 64-bit installations. In most + cases, you should not need to resize the oplog. However, if you do, + see :doc:`/tutorial/change-oplog-size`. + +- Empty blocks + + MongoDB maintains lists of deleted blocks within the data files when + objects or collections are deleted. This space is reused by MongoDB + but never freed to the operating system. + + To reclaim deleted blocks, you can use either of the following: + + - :dbcommand:`compact`, which defragments deleted space. This requires + extra disk space to run and should not be used if you are critically + low on disk space. + + - :dbcommand:`repairDatabase`, which rebuilds the database. Both + options require additional disk space to run. For details, see + :doc:`/tutorial/recover-data-following-unexpected-shutdown`. + + .. warning:: + :dbcommand:`repairDatabase` requires enough free disk space to hold + both the old and new database files while the repair is running. Be + aware that repairDatabase will block and will take a long time to + complete. + +Can I check the size of a collection? +------------------------------------- + +To check the size of a collection and other data, use the +:method:`validate() ` method from the +:program:`mongo` shell. The following example issues the command for the +``orders`` collection: + +.. code-block:: javascript + + db.orders.validate(); + +Alternately, you can view specific measures of size using any of these methods: + +- :method:`dataSize() `: data size for the collection. +- :method:`storageSize() `: allocation size, including unused space. +- :method:`totalSize() `: the data size plus the index size. +- :method:`totalIndexSize() `: the index size. + +Can I check the size of indexes? +-------------------------------- + +To view the size of the data allocated for an index, :method:`validate() +` method using the index namespace. + +.. example:: You look up an index namespace in the ``system.namespaces`` + collection by issuing the following command: + + .. code-block:: javascript + + db.system.namespaces.find() + + which returns + + .. code-block:: javascript + + {"name" : "test.orders"} + {"name" : "test.system.indexes"} + {"name" : "test.orders.$_id_"} + + View the size of the data allocated for the ``orders.$_id_`` index by + issuing the following command: + + .. code-block:: javascript + + db.orders.$_id_.validate() + +How do I know when the server runs out of disk space? +----------------------------------------------------- + +If your server runs out of disk space for data files, you will see +something like this in the log: + +.. code-block:: sh + + Thu Aug 11 13:06:09 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes... + Thu Aug 11 13:06:09 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device + Thu Aug 11 13:06:09 [FileAllocator] will try again in 10 seconds + Thu Aug 11 13:06:19 [FileAllocator] allocating new data file dbms/test.13, filling with zeroes... + Thu Aug 11 13:06:19 [FileAllocator] error failed to allocate new file: dbms/test.13 size: 2146435072 errno:28 No space left on device + Thu Aug 11 13:06:19 [FileAllocator] will try again in 10 seconds + +The server remains in this state forever, blocking all writes including +deletes. However, reads still work. To delete some data and compact, +using the :dbcommand:`compact` command, you must first restart the server +first. + +If your server runs out of disk space for journal files, the server +process will exit. By default, journal files are created in the data +directory in a subdirectory called ``journal``, but you may elect to put +the journal files on another disk by using a symlink. + +.. todo What should we do with this info: + These one-line scripts will print the stats for each database and collection: + db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats())}) + db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})}) From 17e6af251ed9434ad2abe1470fd103e6346fa2a7 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 20 Dec 2012 18:31:26 -0500 Subject: [PATCH 2/6] DOCS-693 review edits --- source/faq/storage.txt | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index 4f0f5d2b51e..dfa9abe2b06 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -105,9 +105,9 @@ RAM. Why are the files in my data directory larger than the data in my database? --------------------------------------------------------------------------- -The data files in your data directory, which is the ``/data/db`` +The data files in your data directory, which is the :file:`/data/db` directory in default configurations, might be larger than the data set -inserted into the database. This is caused by pre-allocated files and by +inserted into the database. This is caused by preallocated files and by empty blocks, as explained here. - Preallocated data files @@ -119,20 +119,20 @@ empty blocks, as explained here. megabytes, the next 128 megabytes, and so on, up to 2 gigabytes, at which point all subsequent files are 2 gigabytes. The data files, therefore, contain files for which space is allocated but no data yet - exists. A file might preallocated 1 gigabyte but be 90% empty. For + exists. A file might preallocate 1 gigabyte but be 90% empty. For databases of hundreds of gigabytes, unallocated space is small compared to the database and is insignificant. On UNIX, :program:`mongod` preallocates an additional data file - prefilled with zero bytes. Pre-filling in the background prevents + filled with zero bytes. Pre-filling in the background prevents significant delays when a new database file is next allocated. - You can disable preallocation with the :option:`--nopreallo ` command line option. Do not use this option in production + You can disable preallocation with the :option:`--noprealloc ` command line option. Do not use this option in production environments. This option is intended for tests with small data sets where you drop the database after each test. - On Linux systems you can use ``hdparam`` to get an idea of how costly + On Linux systems you can use ``hdparm`` to get an idea of how costly allocation might be: .. code-block:: sh @@ -192,7 +192,7 @@ Alternately, you can view specific measures of size using any of these methods: Can I check the size of indexes? -------------------------------- -To view the size of the data allocated for an index, :method:`validate() +To view the size of the data allocated for an index, issue the :method:`validate() ` method using the index namespace. .. example:: You look up an index namespace in the ``system.namespaces`` @@ -234,7 +234,7 @@ something like this in the log: The server remains in this state forever, blocking all writes including deletes. However, reads still work. To delete some data and compact, -using the :dbcommand:`compact` command, you must first restart the server +using the :dbcommand:`compact` command, you must restart the server first. If your server runs out of disk space for journal files, the server From dc1a5b0c5c33a1415548b71db4102c936482a31f Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 3 Jan 2013 18:05:39 -0500 Subject: [PATCH 3/6] DOCS-693 review edits --- source/faq/storage.txt | 48 +++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index dfa9abe2b06..b2c94072cf4 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -107,8 +107,7 @@ Why are the files in my data directory larger than the data in my database? The data files in your data directory, which is the :file:`/data/db` directory in default configurations, might be larger than the data set -inserted into the database. This is caused by preallocated files and by -empty blocks, as explained here. +inserted into the database. This is caused by the following: - Preallocated data files @@ -124,7 +123,7 @@ empty blocks, as explained here. compared to the database and is insignificant. On UNIX, :program:`mongod` preallocates an additional data file - filled with zero bytes. Pre-filling in the background prevents + and initializes the disk space ``0``. Pre-allocating data files in the background prevents significant delays when a new database file is next allocated. You can disable preallocation with the :option:`--noprealloc ` method from the -:program:`mongo` shell. The following example issues the command for the -``orders`` collection: +To view the size of a collection and other information, such as whether +the collection is sharded, use the :method:`stats() +` method from the :program:`mongo` shell. The +following example issues :method:`stats() ` for +the ``orders`` collection: .. code-block:: javascript - db.orders.validate(); + db.orders.stats(); -Alternately, you can view specific measures of size using any of these methods: +To view specific measures of size, use these methods: - :method:`dataSize() `: data size for the collection. - :method:`storageSize() `: allocation size, including unused space. @@ -192,17 +201,22 @@ Alternately, you can view specific measures of size using any of these methods: Can I check the size of indexes? -------------------------------- -To view the size of the data allocated for an index, issue the :method:`validate() -` method using the index namespace. +To view the size of the data allocated for an index, you can do either of the following: + +- Issue the :method:`stats() ` method using the + index namespace. To retrieve a list of namespaces, issue the following command: + ``db.system.namespaces.find()`` + +- Issue the :stats:`stats().indexSizes ` + command. -.. example:: You look up an index namespace in the ``system.namespaces`` - collection by issuing the following command: +.. example:: Issue the following command to retrieve index namespaces: .. code-block:: javascript db.system.namespaces.find() - which returns + The command returns a list similar to the following: .. code-block:: javascript @@ -215,7 +229,7 @@ To view the size of the data allocated for an index, issue the :method:`validate .. code-block:: javascript - db.orders.$_id_.validate() + db.orders.$_id_.stats() How do I know when the server runs out of disk space? ----------------------------------------------------- From 7c8e3aca07ba382411d03d749c4a40b035eb77dc Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Fri, 4 Jan 2013 11:41:04 -0500 Subject: [PATCH 4/6] DOCS-693 review edits --- source/faq/storage.txt | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index b2c94072cf4..514871b54d0 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -123,7 +123,7 @@ inserted into the database. This is caused by the following: compared to the database and is insignificant. On UNIX, :program:`mongod` preallocates an additional data file - and initializes the disk space ``0``. Pre-allocating data files in the background prevents + and initializes the disk space ``0``. Preallocating data files in the background prevents significant delays when a new database file is next allocated. You can disable preallocation with the :option:`--noprealloc `: the data size plus the index size. - :method:`totalIndexSize() `: the index size. +Also, the following scripts will print the stats for each database and +collection: + +.. code-block:: javascript + + db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats())}) + +.. code-block:: javascript + + db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})}) + Can I check the size of indexes? -------------------------------- @@ -254,9 +267,4 @@ first. If your server runs out of disk space for journal files, the server process will exit. By default, journal files are created in the data directory in a subdirectory called ``journal``, but you may elect to put -the journal files on another disk by using a symlink. - -.. todo What should we do with this info: - These one-line scripts will print the stats for each database and collection: - db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats())}) - db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); mdb.getCollectionNames().forEach(function(c) {s = mdb[c].stats(); printjson(s)})}) +the journal files on another disk by using a symlink. \ No newline at end of file From a647456c0c957ddc0027ea84470eeeaa50dc7632 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Fri, 4 Jan 2013 11:59:59 -0500 Subject: [PATCH 5/6] DOCS-693 minor --- source/faq/storage.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index 514871b54d0..e7e1819fefe 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -200,7 +200,7 @@ To view specific measures of size, use these methods: - :method:`totalSize() `: the data size plus the index size. - :method:`totalIndexSize() `: the index size. -Also, the following scripts will print the stats for each database and +Also, the following scripts print the statistics for each database and collection: .. code-block:: javascript From 56fb7cb25cb90a5e0e45ecb231960801183cc9fd Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Fri, 4 Jan 2013 13:36:11 -0500 Subject: [PATCH 6/6] DOCS-693 review edits --- source/faq/storage.txt | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/source/faq/storage.txt b/source/faq/storage.txt index e7e1819fefe..5ec541b47f1 100644 --- a/source/faq/storage.txt +++ b/source/faq/storage.txt @@ -123,7 +123,7 @@ inserted into the database. This is caused by the following: compared to the database and is insignificant. On UNIX, :program:`mongod` preallocates an additional data file - and initializes the disk space ``0``. Preallocating data files in the background prevents + and initializes the disk space to ``0``. Preallocating data files in the background prevents significant delays when a new database file is next allocated. You can disable preallocation with the :option:`--noprealloc