@@ -24,12 +24,6 @@ Testing has shown that |compass| has minimal impact in prototype
24
24
deployments, though additional performance testing and monitoring is in
25
25
progress.
26
26
27
- For best results, use MongoDB 3.2 or higher, which includes the
28
- :manual:`$sample </reference/operator/aggregation/sample/>` operator for
29
- efficient sampling on a collection. On older versions of MongoDB,
30
- |compass| falls back on a
31
- :ref:`less efficient sampling method <compass_fallback_sampling>`.
32
-
33
27
You should only execute queries that are indexed appropriately in the
34
28
database to avoid scanning the entire collection.
35
29
@@ -64,63 +58,6 @@ Why am I seeing a warning about a non-genuine MongoDB server?
64
58
65
59
.. include:: /includes/fact-non-genuine-warning.rst
66
60
67
- .. _compass-faq-sampling:
68
-
69
- What is sampling and why is it used?
70
- ------------------------------------
71
-
72
- Sampling in |compass| is the selection a subset of data
73
- from a particular collection and analyzing the documents within the
74
- sample set.
75
-
76
- Sampling is a common technique in statistical analysis because analyzing
77
- a subset of the data gives similar results to analyzing all of it. In
78
- addition, sampling allows results to be generated quickly rather than
79
- performing a computationally-expensive collection scan.
80
-
81
- How does sampling work?
82
- -----------------------
83
-
84
- |compass| employs two distinct sampling mechanisms.
85
-
86
- In MongoDB 3.2, collections are sampled with the
87
- :manual:`$sample </reference/operator/aggregation/sample/>` operator via
88
- the :manual:`aggregation pipeline </core/aggregation-pipeline>`. This
89
- provides efficient random sampling without replacement over the entire
90
- collection, or over the subset of documents specified by a query.
91
-
92
- .. _compass_fallback_sampling:
93
-
94
- In MongoDB 3.0, collections are sampled via a
95
- backwards-compatible algorithm executed entirely within |compass|. It
96
- takes place in three stages:
97
-
98
- 1. |compass| opens a :term:`cursor` on the desired collection, limited
99
- to at most 10,000 documents sorted in descending order of the ``_id``
100
- field.
101
- 2. ``sampleSize`` documents are randomly selected from the stream. To
102
- do this efficiently, |compass| employs `reservoir sampling
103
- <http://en.wikipedia.org/wiki/Reservoir_sampling>`_.
104
- 3. |compass| performs a query to select the chosen documents directly
105
- via ``_id``.
106
-
107
- ``sampleSize`` is set to 1000 documents.
108
-
109
- .. note::
110
- The choice of sampling method is done transparently in the
111
- background, with no changes required by the user.
112
-
113
- Won't sampling miss documents?
114
- ------------------------------
115
-
116
- Sampling is chosen for its efficiency: the amount of time required to
117
- perform a sample is minimal, on the order of a few seconds. Increasing
118
- the sample confidence will demand more processing power and time.
119
- Furthermore, sophisticated outlier detection requires an inspection of
120
- every document in a MongoDB deployment, which would be unfeasible for
121
- large data sets. The MongoDB team is in the process of conducting user
122
- tests on large data sets to find a reasonable balance.
123
-
124
61
What happens to long running queries?
125
62
-------------------------------------
126
63
@@ -133,9 +70,9 @@ Slow Sampling
133
70
All queries that Compass sends to your MongoDB instance have a timeout
134
71
flag set which automatically aborts a request if it takes longer than
135
72
the specified timeout. This timeout is currently set to 10 seconds. If
136
- sampling on the database takes longer, Compass will notify you about
137
- the timeout and give you the options of (a) retrying with a longer
138
- timeout (60 seconds) or (b) running a different query.
73
+ :ref:` sampling <sampling>` on the database takes longer, Compass will
74
+ notify you about the timeout and give you the options of (a) retrying
75
+ with a longer timeout (60 seconds) or (b) running a different query.
139
76
140
77
.. note::
141
78
0 commit comments