Skip to content

060 distributed search/00_Intro.asciidoc and 05_Query_phase.asciidoc #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions 060_Distributed_Search/00_Intro.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,22 @@ But finding all matching documents is only half the story. Results from
multiple shards must be combined into a single sorted list before the `search`
API can return a ``page'' of results. For this reason, search is executed in a
two-phase process called _query then fetch_.
[[分布式检索]]
== 分布式检索执行

在开始之前,我们先来讨论有关在分布式环境中检索是如何进行的。((("distributed search execution")))比我们之前在<<distributed-docs>>中讨论过的基础的_create-read-update-delete_ (CRUD)请求的((("CRUD (create-read-update-delete) operations")))较为简单。

.内容提示
****

你有兴趣的话可以读一读这章,并不需要为了使用Elasticsearch而理解和记住所有的细节。

这章的阅读目的只为在脑海中形成服务运行的梗概以及了解信息的存放位置以便不时之需,但是不要被细节搞的云里雾里。

****

CRUD的操作处理一个单个的文档,此文档中有一个`_index`, `_type`和<<routing-value,`routing` values>>之间的特殊连接,其中<<routing-value,`routing` values>>的缺省值为`_id`。这意味着我们知道在集群中哪个分片存有此文档。

检索需要一个更为精细的模型因为我们不知道哪条文档会被命中:这些文档可能分布在集群的任何分片上。一条检索的请求需要参考我们感兴趣的所有索引中的每个分片复本,这样来确认索引中是否有任何匹配的文档。

定位所有的匹配文档仅仅是开始,不同分片的结果在`search`的API返回``page''结果前必须融合到一个单个的已分类列表中。正因为如此,检索执行通常两步走,先是_query,然后是fetch_。
63 changes: 17 additions & 46 deletions 060_Distributed_Search/05_Query_phase.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
=== Query Phase
=== 搜索阶段
在最初阶段 _query phase_ 时, ((("distributed search execution", "query phase"))) ((("query phase of distributed search"))) 搜索是广播查询索引中的每一个分片复本,不管是主本还是副本。每个分片执行本地查询,同时 ((("priority queue"))) 创建文档命中后的 _priority queue_ 。

During the initial _query phase_, the((("distributed search execution", "query phase")))((("query phase of distributed search"))) query is broadcast to a shard copy (a
primary or replica shard) of every shard in the index. Each shard executes
the search locally and ((("priority queue")))builds a _priority queue_ of matching documents.

.Priority Queue
.优先队列
****

A _priority queue_ is just a sorted list that holds the _top-n_ matching
documents. The size of the priority queue depends on the pagination
parameters `from` and `size`. For example, the following search request
would require a priority queue big enough to hold 100 documents:
_priority queue_ 仅仅是一个含有命中文档的 _top-n_ 过滤后列表。优先队列的大小取决于分页参数 `from` 和 `size` 。例如,如下搜索请求将需要足够大的优先队列来放入100条文档。

[source,js]
--------------------------------------------------
Expand All @@ -22,52 +15,30 @@ GET /_search
--------------------------------------------------
****

The query phase process is depicted in <<img-distrib-search>>.
查询过程在 <<img-distrib-search>> 中有描述。

[[img-distrib-search]]
.Query phase of distributed search
image::images/elas_0901.png["Query phase of distributed search"]
.Query phase of distributed s
.查询过程分布式搜索
image::images/elas_0901.png["查询过程分布式搜索"]

The query phase consists of the following three steps:
查询过程包含以下几个步骤:

1. The client sends a `search` request to `Node 3`, which creates an empty
priority queue of size `from + size`.
1. 客户端发送 `search` 请求到 `Node 3`,会差生一个大小为 `from + size` 的空优先队列。

2. `Node 3` forwards the search request to a primary or replica copy of every
shard in the index. Each shard executes the query locally and adds the
results into a local sorted priority queue of size `from + size`.
2. `Node 3` 将查询请求前转到每个索引的每个分片中的主本或复本去。每个分片执行本地查询并添加结果到大小为 `from + size` 的本地优先队列中。

3. Each shard returns the doc IDs and sort values of all the docs in its
priority queue to the coordinating node, `Node 3`, which merges these
values into its own priority queue to produce a globally sorted list of
results.
3. 每个分片返回文档的IDs并且将所有优先队列中文档归类到对应的节点, `Node 3` 合并这些值到其优先队列中来产生一个全局排序后的列表。

When a search request is sent to a node, that node becomes the coordinating
node.((("nodes", "coordinating node for search requests"))) It is the job of this node to broadcast the search request to all
involved shards, and to gather their responses into a globally sorted result
set that it can return to the client.
当查询请求到达节点的时候,节点变成了并列节点。 ((("nodes", "coordinating node for search requests"))) 这个节点任务是广播查询请求到所有相关节点并收集其他节点的返回状态存入全局排序后的集合,状态最终可以返回到客户端。

The first step is to broadcast the request to a shard copy of every node in
the index. Just like <<distrib-read,document `GET` requests>>, search requests
can be handled by a primary shard or by any of its replicas.((("shards", "handling search requests"))) This is how more
replicas (when combined with more hardware) can increase search throughput.
A coordinating node will round-robin through all shard copies on subsequent
requests in order to spread the load.
第一步是广播请求到索引中的每个几点钟一个分片复本去。就像 <<distrib-read,document `GET` requests>> 查询请求可以被某个主分片或其副本处理, ((("shards", "handling search requests"))) 则是在结合硬件的时候处理多个复本如何增加查询吞吐率。一个并列节点将在之后的请求中轮询所有的分片复本来分散负载。

Each shard executes the query locally and builds a sorted priority queue of
length `from + size`&#x2014;in other words, enough results to satisfy the global
search request all by itself. It returns a lightweight list of results to the
coordinating node, which contains just the doc IDs and any values required for
sorting, such as the `_score`.
每个分片在本地执行查询请求并且创建一个长度为 `from + size`&#x2014 的优先队列;换句话说,它自己的查询结果来满足全局查询请求,它返回一个轻量级的结果列表到并列节点上,其中并列节点仅包含文档IDs和排序的任何值,比如 `_score` 。

The coordinating node merges these shard-level results into its own sorted
priority queue, which represents the globally sorted result set. Here the query
phase ends.
并列节点合并了这些分片段到其排序后的优先队列,这些队列代表着全局排序结果集合,以下是查询过程结束。

[NOTE]
====
An index can consist of one or more primary shards,((("indices", "multi-index search"))) so a search request
against a single index needs to be able to combine the results from multiple
shards. A search against _multiple_ or _all_ indices works in exactly the same
way--there are just more shards involved.
一个索引可被一个或几个主分片组成, ((("indices", "multi-index search"))) 所以一条搜索请求到单独的索引时需要参考多个分片。除了涉及到更多的分片, _multiple_ 或者 _all_ 索引搜索工作方式是一样的。
====