From 18b8647c5812721adea56cebb0efe33bf1167a93 Mon Sep 17 00:00:00 2001 From: Josephjin Date: Tue, 6 Sep 2016 22:20:23 +0800 Subject: [PATCH 1/8] chapter/chapter10_25-50 --- 010_Intro/25_Tutorial_Indexing.asciidoc | 52 ++++++++++++++-- 010_Intro/30_Tutorial_Search.asciidoc | 59 +++++++++++++++++-- 010_Intro/35_Tutorial_Aggregations.asciidoc | 17 +++++- 010_Intro/40_Tutorial_Conclusion.asciidoc | 6 ++ 010_Intro/45_Distributed.asciidoc | 15 ++++- 010_Intro/50_Conclusion.asciidoc | 5 ++ 130_Partial_Matching/05_Postcodes.asciidoc | 25 ++++---- .../40_Fuzzy_match_query.asciidoc | 17 ++---- 400_Relationships/20_Denormalization.asciidoc | 22 +++---- 9 files changed, 166 insertions(+), 52 deletions(-) diff --git a/010_Intro/25_Tutorial_Indexing.asciidoc b/010_Intro/25_Tutorial_Indexing.asciidoc index 4bfe58049..0ac6cbcd1 100644 --- a/010_Intro/25_Tutorial_Indexing.asciidoc +++ b/010_Intro/25_Tutorial_Indexing.asciidoc @@ -3,21 +3,28 @@ To give you a feel for what is possible in Elasticsearch and how easy it is to use, let's start by walking through a simple tutorial that covers basic concepts such as indexing, search, and aggregations. +为了弄清楚 Elasticsearch 能实现什么以及实现的简易程度,让我们从一个简单的教程开始并介绍诸如索引、搜索以及聚合等基础概念。 We'll introduce some new terminology and basic concepts along the way, but it is OK if you don't understand everything immediately. We'll cover all the concepts introduced here in _much_ greater depth throughout the rest of the book. +我们将结合一些新的技术术语以及基础概念介绍,当然即使现在你无法全盘理解,这并不妨碍你着手学习 Elasticsearch。 +在本书后续内容中,我们将以更深入的角度去介绍以及剖析所有知识点。 So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of. +接下来,让我们快速入门 Elasticsearch。 ==== Let's Build an Employee Directory +==== 第一步:让我们创建一个雇员目录 We happen((("employee directory, building (example)"))) to work for _Megacorp_, and as part of HR's new _"We love our drones!"_ initiative, we have been tasked with creating an employee directory. The directory is supposed to foster employer empathy and real-time, synergistic, dynamic collaboration, so it has a few business requirements: +撰写此书时,我们正好受聘于 Megacorp 公司。同时作为 HR 部门一项新的激励项目“无人机精神”的部分内容,我们需要为此创建一个雇员目录。这项目录应当能支持团队实时协作、动态更新来提高团队沟通效能,因此,它拓展了一些业务性需求: + * Enable data to contain multi value tags, numbers, and full text. * Retrieve the full details of any employee. @@ -26,18 +33,27 @@ business requirements: * Return highlighted search _snippets_ from the text in the matching documents. * Enable management to build analytic dashboards over the data. +* 使数据包含多个标签、数值、以及全文链接 +* 检索任一员工的完整个人信息 +* 允许结构化搜索,诸如找到年纪在 30 岁以上的员工 +* 允许简单的全文搜索以及更复杂的短语搜索 +* 在匹配的文档内容中返回搜索的片断文本 +* 基于数据,创建并管理分析仪表盘 + === Indexing Employee Documents +=== 第二步:索引员工文档 The first order of business is storing employee data.((("documents", "indexing")))((("indexing"))) This will take the form of an _employee document_: a single document represents a single employee. The act of storing data in Elasticsearch is called _indexing_, but before we can index a document, we need to decide _where_ to store it. +首先的需求是以表格的方式(即:索引&文档)存储员工数据,其次每个表格可独立存储一个员工数据,在 Elasticsearch 存储数据的行为叫做索引(即 indexing),但是在索引前我们需要决定将文档存储在哪里。 -An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in -turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, +An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, and each document has((("fields"))) multiple _fields_. +一个 Elasticsearch 集群可以包含多重目录,由此可知集群的多样性特点。不同类型的集群管理不同的文档,每个文档有不同的 field。 .Index Versus Index Versus Index ************************************************** @@ -45,21 +61,28 @@ and each document has((("fields"))) multiple _fields_. You may already have noticed that the word _index_ is overloaded with several meanings in the context of Elasticsearch.((("index, meanings in Elasticsearch"))) A little clarification is necessary: +你也许已经注意到 index 这个词在 Elasticsearch 内容中包含多重意思。 +很显然,现在我们需要对 index 作精简的说明和注释。 Index (noun):: +索引(作名词用): As explained previously, an _index_ is like a _database_ in a traditional relational database. It is the place to store related documents. The plural of _index_ is _indices_ or _indexes_. +如上所述,一个 index 如同一个传统的关系数据库。在这里,你可以存储相关文档。index 的复数词为 indices 或 indexes。 Index (verb):: +编入索引(作动词用): _To index a document_ is to store a document in an _index (noun)_ so that it can be retrieved and queried. It is much like the `INSERT` keyword in SQL except that, if the document already exists, the new document would replace the old. +将一个文档编入索引中,即在一个索引(作名词用)中进行存储行为。因此,“编入索引”这个动作是可逆(retrieved)且可查询的(queried)。就像 SQL 中的 INSERT 关键字,如果文档已经存在,那么新的文档会覆盖旧的。 Inverted index:: +反向索引: Relational databases add an _index_, such as a B-tree index,((("relational databases", "indices"))) to specific columns in order to improve the speed of data retrieval. Elasticsearch and @@ -69,19 +92,29 @@ purpose. By default, every field in a document is _indexed_ (has an inverted index) and thus is searchable. A field without an inverted index is not searchable. We discuss inverted indexes in more detail in <>. +为了提高数据检索速度,我们往往会在相关的数据库中增加一个索引诸如二叉树索引结构来明确向量。类似地,Elasticsearch 以及 Lucene 使用一种名叫“反向索引”(inverted index)的结构来达到相同目的。 +默认地,文件中的每个 field 是可索引的(具备一个反向索引)且可被搜索。不具备反向索引的 field 无法被搜索。 +我们会在《反向索引》这一章节中讨论更多细节。 ************************************************** So for our employee directory, we are going to do the following: +关于我们的员工目录,我们将执行如下步骤: + -* Index a _document_ per employee, which contains all the details of a single - employee. -* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. +* Index a _document_ per employee, which contains all the details of a single + employee. +* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. * That type will live in the `megacorp` _index_. * That index will reside within our Elasticsearch cluster. +* 为每个员工创建文档并编入索引,其中包含了员工的所有信息; +* 每个文档的类型为 'employee'; +* 'employee' 类型包含在 'megacorp' 索引下; +* 'megacorp' 索引保存在 Elasticsearch 集群中。 In practice, this is easy (even though it looks like a lot of steps). We can perform all of those actions in a single command: +实际上这些步骤(尽管看起来繁琐)实践起来并不难,我们可以在一个单独指令中演示完毕。 [source,js] -------------------------------------------------- @@ -98,26 +131,35 @@ PUT /megacorp/employee/1 Notice that the path `/megacorp/employee/1` contains three pieces of information: +请注意,路径`/megacorp/employee/1`包含了三条信息: +megacorp+:: The index name ++megacorp+: 索引名字 +employee+:: The type name ++employee+: 类型名 +1+:: The ID of this particular employee ++1+: 每位员工的特定编码 The request body--the JSON document--contains all the information about this employee. His name is John Smith, he's 25, and enjoys rock climbing. +而作为请求的 JSON 文档,包含了这位员工的所有详细信息。他的名字叫约翰·史密斯,今年 25 岁,喜欢攀岩。 Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values. +很简单对不对?为此,我们没必要再执行管理规范性的事务,如创建索引或指定每个 field 所包含的数据类型。 +我们可以直接索引一个文档。 +所有规范性的任务 Elasticsearch 会默认匹配,因此繁琐的基础操作都将在后台完成。 Before moving on, let's add a few more employees to the directory: +进行下一步工作前,让我们在目录中增加更多员工信息: [source,js] -------------------------------------------------- diff --git a/010_Intro/30_Tutorial_Search.asciidoc b/010_Intro/30_Tutorial_Search.asciidoc index dfdd2781e..aba0fdceb 100644 --- a/010_Intro/30_Tutorial_Search.asciidoc +++ b/010_Intro/30_Tutorial_Search.asciidoc @@ -1,12 +1,18 @@ === Retrieving a Document +=== 检索文档 Now that we have some data stored in Elasticsearch,((("documents", "retrieving"))) we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data. +目前我们已经在 Elasticsearch 中存储了一些数据,接下来我们将在此基础上拓展业务性需求。 +第一个要求——检索单个员工的数据。 + This is easy in Elasticsearch. We simply execute((("HTTP requests", "retrieving a document with GET"))) an HTTP +GET+ request and specify the _address_ of the document--the index, type, and ID.((("id", "specifying in a request")))((("indices", "specifying index in a request")))((("types", "specifying type in a request"))) Using those three pieces of information, we can return the original JSON document: +这在 Elasticsearch 中很简单。让我们通过执行简单的 HTTP+GET 指令并明确文档地址(索引名、类型以及雇员编码) +使用这三条核心信息,检索操作将返回原始的 JSON 文档。 [source,js] -------------------------------------------------- @@ -16,6 +22,7 @@ GET /megacorp/employee/1 And the response contains some metadata about the document, and John Smith's original JSON document ((("_source field", sortas="source field")))as the `_source` field: +同时,response 包含了关于文档的一些元数据以及约翰·史密斯本人的原始 JSON 文档。 [source,js] -------------------------------------------------- @@ -36,20 +43,26 @@ original JSON document ((("_source field", sortas="source field")))as the `_sour -------------------------------------------------- [TIP] +提示: ==== In the same way that we changed ((("HTTP methods")))the HTTP verb from `PUT` to `GET` in order to retrieve the document, we could use the `DELETE` verb to delete the document, and the `HEAD` verb to check whether the document exists. To replace an existing document with an updated version, we just `PUT` it again. ==== +==== +在上述操作中,我们将 PUT 指令修改为 GET 来检索文档。同样地,我们可以使用 DELETE 指令来删除文档,HEAD 指令来检查文档是否存在。为了升级并替换已有文档,我们只需要使用 PUT 指令植入升级版本即可。 === Search Lite +=== 搜索简化版本 -A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search"))) Let's -try something a little more advanced, like a simple search! +A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search")))Let's try something a little more advanced, like a simple search! +一个 GET 指令相当简单,由此你可以获取想要检索的文档。 +接下来,让我们尝试更多高级的功能,比如一个简单的搜索。 The first search we will try is the simplest search possible. We will search for all employees, with this request: +初次搜索,我们将尽其可能地简单快速。通过以下指令,我们可以搜索所有员工信息: [source,js] -------------------------------------------------- @@ -61,6 +74,8 @@ You can see that we're still using index `megacorp` and type `employee`, but instead of specifying a document ID, we now use the `_search` endpoint. The response includes all three of our documents in the `hits` array. By default, a search will return the top 10 results. +你可以看到,我们还在使用索引指令 'megacorp' 以及类型名 'employee',但是与明确文档编码不同,我们现在使用 'search' 指令作为**地址终点符**。Response 在 'hits' 数组中包含了所有三个文档 +默认地,一个搜索可以返回头十位结果。 [source,js] -------------------------------------------------- @@ -119,11 +134,14 @@ a search will return the top 10 results. NOTE: The response not only tells us which documents matched, but also includes the whole document itself: all the information that we need in order to display the search results to the user. +注意:response 不仅仅能告知哪些文档匹配要求,同时包含了文档源文件的信息,以向用户展示最终搜索结果。 Next, let's try searching for employees who have ``Smith'' in their last name. To do this, we'll use a _lightweight_ search method that is easy to use from the command line. This method is often referred to as ((("query strings")))a _query-string_ search, since we pass the search as a URL query-string parameter: +下一步,让我们来尝试搜索哪些员工的姓氏为“Simth”。 +为了完成这一步操作,我们将使用 lightweight 搜索法,同时这在命令行中可以轻松获得。 [source,js] -------------------------------------------------- @@ -133,6 +151,8 @@ GET /megacorp/employee/_search?q=last_name:Smith We use the same `_search` endpoint in the path, and we add the query itself in the `q=` parameter. The results that come back show all Smiths: +我们同样在路径中使用 search 作为终点符,同时我们通过查询参数 'q=' 中添加标准 last_name:Smith。 +由此,检索结果返回所有与 Smith 相关的内容: [source,js] -------------------------------------------------- @@ -168,14 +188,19 @@ the `q=` parameter. The results that come back show all Smiths: -------------------------------------------------- === Search with Query DSL +=== 使用 Query DSL 来搜索 Query-string search is handy for ad hoc searches((("ad hoc searches"))) from the command line, but it has its limitations (see <>). Elasticsearch provides a rich, flexible, query language called the _query DSL_, which((("Query DSL"))) allows us to build much more complicated, robust queries. +Query-string 命令行对于点对点搜索来说是易于上手的,但是它也有自身的局限性(详情见《搜索简化》章节)。 +Elasticsearch 提供一个丰富的、流畅的 query 语言:query DSL。它使我们有能力创建更加复杂、可靠的查询体系。 The _domain-specific language_ (DSL) is((("DSL (Domain Specific Language)"))) specified using a JSON request body. We can represent the previous search for all Smiths like so: +DSL,作为一门特定领域语言,指定使用一个 JSON 请求。 +关于 Smith 作为姓氏条件进行搜索,我们可以如此展示: [source,js] @@ -196,13 +221,16 @@ number of things have changed. For one, we are no longer using _query-string_ parameters, but instead a request body. This request body is built with JSON, and uses a `match` query (one of several types of queries, which we will learn about later). +如先前查询方式,我们将获得相同结果。但是你可以看到,一些指令已经发生变化。举个例子,我们不再使用 query-string 参数,而由一个JSON请求作为替代方案。同时使用了 match 查询(属于查询类型之一,后续内容中我们将着重讲述)。 === More-Complicated Searches +=== 更复杂的搜索 Let's make the search a little more complicated.((("searches", "more complicated")))((("filters"))) We still want to find all employees with a last name of Smith, but we want only employees who are older than 30. Our query will change a little to accommodate a _filter_, which allows us to execute structured searches efficiently: +让我们试试更复杂的搜索吧:同样搜索出姓氏为 Smith 的员工,但是这次我们只需要其中年龄大于 30 的。我们的查询需要稍作调整来适应 filter 指令,由此我们将快速有效地执行结构化搜索。 [source,js] -------------------------------------------------- @@ -229,12 +257,15 @@ GET /megacorp/employee/_search <1> This portion of the query is the((("match queries"))) same `match` _query_ that we used before. <2> This portion of the query is a `range` _filter_, which((("range filters"))) will find all ages older than 30—`gt` stands for _greater than_. +<1> 这部分 'query' 与我们之前 'match' 指令中的 query 是一样的。 +<2> 这部分 'query' 是一个 'range' filter,它能帮助我们找到年龄大于 30 的对象。 Don't worry about the syntax too much for now; we will cover it in great detail later. Just recognize that we've added a _filter_ that performs a range search, and reused the same `match` query as before. Now our results show only one employee who happens to be 32 and is named Jane Smith: +当前我们不需要担心语法排列问题,之后我们将在细节部分做优化。我们只需要明确通过 filter 指令增加范围搜索功能,同时继续复用 match 查询指令。现在,我们的结果展示了恰巧只有一位员工符合条件,她叫 Jane Smith,今年 32 岁。 [source,js] -------------------------------------------------- @@ -260,12 +291,16 @@ only one employee who happens to be 32 and is named Jane Smith: -------------------------------------------------- === Full-Text Search +=== 全文本搜索 The searches so far have been simple: single names, filtered by age. Let's try a more advanced, full-text search--a ((("full text search")))task that traditional databases would really struggle with. +截止目前,搜索功能相对简单:单个姓名、年龄作筛选值。 +让我们尝试更复杂的全文本搜索,一项传统数据库难以搞定的任务。 We are going to search for all employees who enjoy rock climbing: +我们将搜索所有喜欢攀岩的员工信息: [source,js] -------------------------------------------------- @@ -282,6 +317,7 @@ GET /megacorp/employee/_search You can see that we use the same `match` query as before to search the `about` field for ``rock climbing''. We get back two matching documents: +你可以看到,我们依旧使用 match 查询指令来搜索 about 一栏内容,并用 "rock climbing" 作为 about 内容条件。由此,我们得到两项匹配的文档: [source,js] -------------------------------------------------- @@ -318,31 +354,38 @@ field for ``rock climbing''. We get back two matching documents: } -------------------------------------------------- <1> The relevance scores +<1> 相关性分值 By default, Elasticsearch sorts((("relevance scores"))) matching results by their relevance score, that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith's `about` field clearly says ``rock climbing'' in it. +默认地,Elasticsearch 通过相关性分值来分类匹配的结果,即依据每份文档匹配查询条件的程度作评估。显而易见地,相关性分值最高的结果属于 John Smith,因为他在 about 一栏中填写了 "rock climbing"。 But why did Jane Smith come back as a result? The reason her document was returned is because the word ``rock'' was mentioned in her `about` field. Because only ``rock'' was mentioned, and not ``climbing,'' her `_score` is lower than John's. +但是为什么 Jane Smith 也作为结果返回了呢?她的文档信息能够返回是因为 "rock" 这个字眼在她的 'about' 一栏中被提及到,但是因为只有 'rock' 而缺少了 'climbing' 字眼,因此她的相关性分值低于 John 的。 This is a good example of how Elasticsearch can search _within_ full-text fields and return the most relevant results first. This ((("relevance", "importance to Elasticsearch")))concept of _relevance_ is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn't. +这个案例很好地说明了 Elasticsearch 能够在全文本范围内进行搜索并优先返回相关性最强的结果。对于 Elasticsearch 来说,相关性(relevance)的概念非常重要,而这对于传统的关系数据库来说却是崭新的。 === Phrase Search +=== 词组搜索 Finding individual words in a field is all well and good, but sometimes you -want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could -perform a query that will match only employee records that contain both ``rock'' +want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could perform a query that will match only employee records that contain both ``rock'' _and_ ``climbing'' _and_ that display the words next to each other in the phrase ``rock climbing.'' +// +在一个 field 中找到特定的文本信息没有问题,但是有时候你想要搜索结果能够匹配特定序列的字眼或词组。举例说明,我们可以执行一段 query 指令来找出符合要求的员工信息:包含 "rock" 与 "climbing" 字眼且两个字眼紧挨着彼此。 To do this, we use a slight variation of the `match` query called the `match_phrase` query: +为了完成这一点,我们对 match 查询指令稍作调整,并称之为 match_phrase 查询指令: [source,js] -------------------------------------------------- @@ -358,6 +401,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/30_Query_DSL.json This, to no surprise, returns only John Smith's document: +看,毫无悬念,返回结果仅仅只有 John Smith 一个人的文档。 [source,js] -------------------------------------------------- @@ -385,12 +429,15 @@ This, to no surprise, returns only John Smith's document: [[highlighting-intro]] === Highlighting Our Searches +=== 突出我们的搜索 Many applications like to _highlight_ snippets((("searches", "highlighting search results")))((("highlighting searches"))) of text from each search result so the user can see _why_ the document matched the query. Retrieving highlighted fragments is easy in Elasticsearch. +许多应用喜欢在每个搜索结果中高亮片段信息,由此用户可以更清晰地知道为什么搜索结果符合查询条件。实际上,在 Elasticsearch 中检索高亮片段不难。 Let's rerun our previous query, but add a new `highlight` parameter: +我们只要重新运行之前的查询指令,但是需要增加一个新的 highlight 参数: [source,js] -------------------------------------------------- @@ -414,6 +461,8 @@ When we run this query, the same hit is returned as before, but now we get a new section in the response called `highlight`. This contains a snippet of text from the `about` field with the matching words wrapped in `` HTML tags: +当我们运行这段查询指令时,返回的采样数会保持不变,同时我们还将在返回值中获得一段名为 'highlight' 的片段。这个片段包含了 'about' 一栏内匹配的词组信息,并用 HTML 标签格式('')进行高亮包装。 + [source,js] -------------------------------------------------- @@ -445,6 +494,8 @@ HTML tags: -------------------------------------------------- <1> The highlighted fragment from the original text +<1> 源文本中高亮的片段信息 You can read more about the highlighting of search snippets in the {ref}/search-request-highlighting.html[highlighting reference documentation]. +关于搜索中的高亮片段,你可以在{ref}/search-request-highlighting.html[highlighting reference documentation].阅读更多信息。 diff --git a/010_Intro/35_Tutorial_Aggregations.asciidoc b/010_Intro/35_Tutorial_Aggregations.asciidoc index 47429874c..b58c46374 100644 --- a/010_Intro/35_Tutorial_Aggregations.asciidoc +++ b/010_Intro/35_Tutorial_Aggregations.asciidoc @@ -1,11 +1,13 @@ === Analytics +=== 分析 Finally, we come to our last business requirement: allow managers to run analytics over the employee directory.((("analytics"))) Elasticsearch has functionality called -_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your -data. It is similar to `GROUP BY` in SQL, but much more powerful. +_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your data. It is similar to `GROUP BY` in SQL, but much more powerful. +最终,我们要讨论最后一个业务要求:允许管理者针对员工目录运行分析进程。Elasticsearch 有一项功能称之为 aggregations,通过 aggregations 系统进程可以基于你的数据产生精细的分析。与 SQL 中的 'GROUP BY' 功能类似,但是 aggregations 更加强大。 For example, let's find the most popular interests enjoyed by our employees: +举个例子,让我们挖掘出员工之间最受欢迎的兴趣爱好: [source,js] -------------------------------------------------- @@ -21,6 +23,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json Ignore the syntax for now and just look at the results: +忽略掉这些语法,让我们看看结果: [source,js] -------------------------------------------------- @@ -53,6 +56,7 @@ in sports. These aggregations are not precalculated; they are generated on the fly from the documents that match the current query. If we want to know the popular interests of people called Smith, we can just add the appropriate query into the mix: +我们可以看到,两位员工对音乐感兴趣,一位对林业感兴趣,一位对运动感兴趣。最终结果并不是预先统计的,而是文档信息与查询条件实时匹配后所收集的结果。如果我们想知道名为 Smith 的员工中最受欢迎的兴趣爱好,可以直接在混合参数中添加适当的查询条件: [source,js] -------------------------------------------------- @@ -75,6 +79,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json The `all_interests` aggregation has changed to include only documents matching our query: +是的,'all_interests' 集合条件做了调整以期只包含匹配我们查询条件的文档: [source,js] -------------------------------------------------- @@ -95,6 +100,8 @@ The `all_interests` aggregation has changed to include only documents matching o Aggregations allow hierarchical rollups too.((("aggregations", "hierarchical rollups in"))) For example, let's find the average age of employees who share a particular interest: +//?????? +举个例子,我们来查询拥有特定兴趣爱好的员工群体平均年龄: [source,js] -------------------------------------------------- @@ -116,6 +123,7 @@ GET /megacorp/employee/_search The aggregations that we get back are a bit more complicated, but still fairly easy to understand: +下面这段返回的集合值更复杂了些,不过理解起来还是相当简单的: [source,js] -------------------------------------------------- @@ -151,6 +159,11 @@ The output is basically an enriched version of the first aggregation we ran. We still have a list of interests and their counts, but now each interest has an additional `avg_age`, which shows the average age for all employees having that interest. +这段输出值基本上算是我们第一次获得的数值集合的升级版。 +在这个版本中,我们依旧获得了一串兴趣爱好清单以及相应的统计值,只不过每个兴趣都有了附加的 'avg_age' 值,代表这个兴趣爱好所覆盖员工的平均年龄。 Even if you don't understand the syntax yet, you can easily see how complex aggregations and groupings can be accomplished using this feature. The sky is the limit as to what kind of data you can extract! +即使你现在还不明白这些语法段,没关系,至少你可以快速了解到集合以及集合群通过 Elasticsearch 特性实现过程有多复杂。 +//你能提取的数据种类,就如天空一样,没有边际。 + diff --git a/010_Intro/40_Tutorial_Conclusion.asciidoc b/010_Intro/40_Tutorial_Conclusion.asciidoc index a0e5394e4..5bb1fcbcf 100644 --- a/010_Intro/40_Tutorial_Conclusion.asciidoc +++ b/010_Intro/40_Tutorial_Conclusion.asciidoc @@ -1,11 +1,17 @@ === Tutorial Conclusion +=== 教程结语 Hopefully, this little tutorial was a good demonstration about what is possible in Elasticsearch. It is really just scratching the surface, and many features--such as suggestions, geolocation, percolation, fuzzy and partial matching--were omitted to keep the tutorial short. But it did highlight just how easy it is to start building advanced search functionality. No configuration was needed--just add data and start searching! +令人感到开心的是,这篇小教程对于解释 Elasticsearch 有哪些可能性,是一个不错的演示。 +目前我们仅仅是浅尝辄止,为了保持教程简洁明了,更多特性诸如 suggestions、geolocation,percolation,模糊与局部匹配被省略而过。 +但是它在证明如何简单地创建搜索功能方面作了突出成绩。不需要配置——只需要简单的添加数据。 + It's likely that the syntax left you confused in places, and you may have questions about how to tweak and tune various aspects. That's fine! The rest of the book dives into each of these issues in detail, giving you a solid understanding of how Elasticsearch works. +很可能你对各处的语法段有疑惑,同时关于如何调整语法段来获得多维度数据也有困惑,没关系!本书后续内容将细分成各个章节,以期帮助读者扎实了解 Elasticsearch 的工作原理。 diff --git a/010_Intro/45_Distributed.asciidoc b/010_Intro/45_Distributed.asciidoc index 96f3ce298..9768be414 100644 --- a/010_Intro/45_Distributed.asciidoc +++ b/010_Intro/45_Distributed.asciidoc @@ -1,10 +1,14 @@ === Distributed Nature +=== 分布式特性 At the beginning of this chapter, we said that Elasticsearch((("distributed nature of Elasticsearch"))) can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn't touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed. +在本章开头,我们已经提到过 Elasticsearch 既可以拓展到数百种(甚至数千种)服务器,也可以处理字节级数据。 +在如何使用 Elasticsearch 方面给出实际案例时,这份教程并不涉及任何复杂性工作。 +Elasticsearch 生来即分布式,同时在设计时已经隐藏了由分布式所带来的复杂特性。 The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster @@ -12,24 +16,32 @@ discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way. +Elasticsearch 的分布式特性很大程度上是透明的。教程不会要求你了解分布式系统、分片、集群算法或其他数十种分布式概念。 +它会在你的电脑中的每个节点上开心地跑程序,即使你要在一个包含 100 个节点的集群上运行程序,一切依旧顺畅。 Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood: +Elasticsearch 尽其所能来隐藏分布式系统的复杂性。这里我们列举了一些自动运行的操作: * Partitioning your documents into different containers((("documents", "partitioning into shards")))((("shards"))) or _shards_, which can be stored on a single node or on multiple nodes + * 在不同容器或分片中进行文档分区,同时操作可以储存在单一或多个节点中; * Balancing these shards across the nodes in your cluster to spread the indexing and search load + * 为拓展索引以及搜索负载量,在集群中跨节点平衡分片; * Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure + * 复制每个分片来保证数据多余拷贝,这样能防止因为硬件故障导致数据丢失; * Routing requests from any node in the cluster to the nodes that hold the data you're interested in + * 将集群中任一节点中的请求与存储有你感兴趣数据的节点联系在一起; * Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss + * 随着集群增长,无缝地整合新节点;自动重新分配分片来恢复丢失节点。 As you read through this book, you'll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about @@ -37,9 +49,10 @@ how the cluster scales and deals with failover (<>), handles document storage (<>), executes distributed search (<>), and what a shard is and how it works (<>). +当你阅读此教程时,你会遇到有关 Elasticsearch 分布式特性的补充章节。这些章节将教你有关集群规模、如何处理故障转移、处理文档存储、执行分布式搜索、什么是分区(shard)以及它的工作原理。 These chapters are not required reading--you can use Elasticsearch without understanding these internals--but they will provide insight that will make your knowledge of Elasticsearch more complete. Feel free to skim them and revisit at a later point when you need a more complete understanding. - +这些章节不强制你阅读,你完全可以在没有理解这些内核知识的情况下使用 Elasticsearch。但是他们会帮助你在学习 Elasticsearch 时有更多的洞察力。跳过他们吧,不要有压力,当你需要更深入的理解时回过头来读也无妨。 diff --git a/010_Intro/50_Conclusion.asciidoc b/010_Intro/50_Conclusion.asciidoc index b386d3716..db1d2db88 100644 --- a/010_Intro/50_Conclusion.asciidoc +++ b/010_Intro/50_Conclusion.asciidoc @@ -1,13 +1,17 @@ === Next Steps +=== 后续步骤 By now you should have a taste of what you can do with Elasticsearch, and how easy it is to get started. Elasticsearch tries hard to work out of the box with minimal knowledge and configuration. The best way to learn Elasticsearch is by jumping in: just start indexing and searching! +相信现在你已经对于能够通过 Elasticsearch 实现什么样的功能、以及操作的简易程度有了初步概念。Elasticsearch 努力通过最少的知识以及认证来达到开箱即用的效果。 +我相信,学习 Elasticsearch 最好的方式是参与:开始属于你的索引和搜索吧。 However, the more you know about Elasticsearch, the more productive you can become. The more you can tell Elasticsearch about the domain-specific elements of your application, the more you can fine-tune the output. +然而,对于 Elasticsearch 你知道得越多,你就更有生产力。对于你的应用你能掌握更多特定域元素,你就能更好地微调输出值。 The rest of this book will help you move from novice to expert. Each chapter explains the essentials, but also includes expert-level tips. If you're just getting started, these tips are probably not immediately relevant @@ -15,3 +19,4 @@ to you; Elasticsearch has sensible defaults and will generally do the right thing without any interference. You can always revisit these chapters later, when you are looking to improve performance by shaving off any wasted milliseconds. +本书的后续内容将帮助你从新手向专家转变,每个章节不仅阐述必要信息,而且包含专家级建议。如果你仍旧是新手水平,这些建议短时间内不一定适用;Elasticsearch 具备的默认值通常能保证在没有干扰下正确执行命令。当你寻求方式以提高程序质量来节约以毫秒计的浪费,到时回过头来重新阅读这些章节不迟。 diff --git a/130_Partial_Matching/05_Postcodes.asciidoc b/130_Partial_Matching/05_Postcodes.asciidoc index d3a47a907..18b27e22f 100644 --- a/130_Partial_Matching/05_Postcodes.asciidoc +++ b/130_Partial_Matching/05_Postcodes.asciidoc @@ -1,22 +1,19 @@ -=== Postcodes and Structured Data +=== 邮编与结构化数据 -We will use United Kingdom postcodes (postal codes in the United States) to illustrate how((("partial matching", "postcodes and structured data"))) to use partial matching with -structured data. UK postcodes have a well-defined structure. For instance, the -postcode `W1V 3DG` can((("postcodes (UK), partial matching with"))) be broken down as follows: +我们会使用美国目前使用的邮编形式(United Kingdom postcodes 标准)来说明如何用部分匹配查询结构化数据。((("partial matching", "postcodes and structured data")))这种邮编形式有很好的结构定义。例如,邮编 `W1V 3DG` 可以分解成如下形式:((("postcodes (UK), partial matching with"))) -* `W1V`: This outer part identifies the postal area and district: +* `W1V` :这是邮编的外部,它定义了邮件的区域和行政区: -** `W` indicates the area (one or two letters) -** `1V` indicates the district (one or two numbers, possibly followed by a letter) +** `W` 代表区域( 1 或 2 个字母) +** `1V` 代表行政区( 1 或 2 个数字,可能跟着一个字符) -* `3DG`: This inner part identifies a street or building: +* `3DG` :内部定义了街道或建筑: -** `3` indicates the sector (one number) -** `DG` indicates the unit (two letters) +** `3` 代表街区区块( 1 个数字) +** `DG` 代表单元( 2 个字母) -Let's assume that we are indexing postcodes as exact-value `not_analyzed` -fields, so we could create our index as follows: +假设将邮编作为 `not_analyzed` 的精确值字段索引,所以可以为其创建索引,如下: [source,js] -------------------------------------------------- @@ -36,7 +33,7 @@ PUT /my_index -------------------------------------------------- // SENSE: 130_Partial_Matching/10_Prefix_query.json -And index some ((("indexing", "postcodes")))postcodes: +然后索引一些邮编:((("indexing", "postcodes"))) [source,js] -------------------------------------------------- @@ -57,4 +54,4 @@ PUT /my_index/address/5 -------------------------------------------------- // SENSE: 130_Partial_Matching/10_Prefix_query.json -Now our data is ready to be queried. +现在这些数据已可查询。 diff --git a/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc b/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc index 2a32ef3a4..8dc1ac471 100644 --- a/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc +++ b/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc @@ -1,7 +1,7 @@ [[fuzzy-match-query]] -=== Fuzzy match Query +=== 模糊匹配查询 -The `match` query supports ((("typoes and misspellings", "fuzzy match query")))((("match query", "fuzzy matching")))((("fuzzy matching", "match query")))fuzzy matching out of the box: +`match` 查询支持((("typoes and misspellings", "fuzzy match query")))((("match query", "fuzzy matching")))((("fuzzy matching", "match query")))开箱即用的模糊匹配: [source,json] ----------------------------------- @@ -19,11 +19,9 @@ GET /my_index/my_type/_search } ----------------------------------- -The query string is first analyzed, to produce the terms `[surprize, me]`, and -then each term is fuzzified using the specified `fuzziness`. +查询字符串首先进行分析,会产生词项 `[surprize, me]` ,并且每个词项根据指定的 `fuzziness` 进行模糊化。 -Similarly, the `multi_match` query also ((("multi_match queries", "fuzziness support")))supports `fuzziness`, but only when -executing with type `best_fields` or `most_fields`: +同样, `multi_match` 查询也((("multi_match queries", "fuzziness support")))支持 `fuzziness` ,但只有当执行查询时类型是 `best_fields` 或者 `most_fields` : [source,json] ----------------------------------- @@ -39,9 +37,6 @@ GET /my_index/my_type/_search } ----------------------------------- -Both the `match` and `multi_match` queries also support the `prefix_length` -and `max_expansions` parameters. - -TIP: Fuzziness works only with the basic `match` and `multi_match` queries. It -doesn't work with phrase matching, common terms, or `cross_fields` matches. +`match` 和 `multi_match` 查询都支持 `prefix_length` 和 `max_expansions` 参数。 +TIP: 模糊性(Fuzziness)只能在 `match` and `multi_match` 查询中使用。不能使用在短语匹配、常用词项或 `cross_fields` 匹配。 diff --git a/400_Relationships/20_Denormalization.asciidoc b/400_Relationships/20_Denormalization.asciidoc index 9b72605f5..2a39c4e91 100644 --- a/400_Relationships/20_Denormalization.asciidoc +++ b/400_Relationships/20_Denormalization.asciidoc @@ -1,15 +1,11 @@ [[denormalization]] -=== Denormalizing Your Data +=== 非规范化你的数据 -The way to get the best search performance out of Elasticsearch is to use it -as it is intended, by((("relationships", "denormalizing your data")))((("denormalization", "denormalizing data at index time"))) -http://en.wikipedia.org/wiki/Denormalization[denormalizing] your data at index -time. Having redundant copies of data in each document that requires access to -it removes the need for joins. -If we want to be able to find a blog post by the name of the user who wrote it, -include the user's name in the blog-post document itself: +使用 Elasticsearch 得到最好的搜索性能的方法是有目的的通过在索引时进行非规范化 ((("relationships", "denormalizing your data")))((("denormalization", "denormalizing data at index time"))) +http://en.wikipedia.org/wiki/Denormalization[denormalizing]。对每个文档保持一定数量的冗余副本可以在需要访问时避免进行关联。 +如果我们希望能够通过某个用户姓名找到他写的博客文章,可以在博客文档中包含这个用户的姓名: [source,json] -------------------------------- @@ -30,10 +26,9 @@ PUT /my_index/blogpost/2 } } -------------------------------- -<1> Part of the user's data has been denormalized into the `blogpost` document. +<1> 这部分用户的字段数据已被冗余到 `blogpost` 文档中。 -Now, we can find blog posts about `relationships` by users called `John` -with a single query: +现在,我们通过单次查询就能够通过 `relationships` 找到用户 `John` 的博客文章。 [source,json] -------------------------------- @@ -50,7 +45,4 @@ GET /my_index/blogpost/_search } -------------------------------- -The advantage of data denormalization is speed. Because each document -contains all of the information that is required to determine whether it -matches the query, there is no need for expensive joins. - +数据非规范化的优点是速度快。因为每个文档都包含了所需的所有信息,当这些信息需要在查询进行匹配时,并不需要进行昂贵的联接操作。 From 21d77a096d15bb736ace3a7744680ce772e47e52 Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:39:22 +0800 Subject: [PATCH 2/8] Revert "chapter/chapter10_25-50" This reverts commit 18b8647c5812721adea56cebb0efe33bf1167a93. --- 010_Intro/25_Tutorial_Indexing.asciidoc | 52 ++-------------- 010_Intro/30_Tutorial_Search.asciidoc | 59 ++----------------- 010_Intro/35_Tutorial_Aggregations.asciidoc | 17 +----- 010_Intro/40_Tutorial_Conclusion.asciidoc | 6 -- 010_Intro/45_Distributed.asciidoc | 15 +---- 010_Intro/50_Conclusion.asciidoc | 5 -- 130_Partial_Matching/05_Postcodes.asciidoc | 25 ++++---- .../40_Fuzzy_match_query.asciidoc | 17 ++++-- 400_Relationships/20_Denormalization.asciidoc | 22 ++++--- 9 files changed, 52 insertions(+), 166 deletions(-) diff --git a/010_Intro/25_Tutorial_Indexing.asciidoc b/010_Intro/25_Tutorial_Indexing.asciidoc index 0ac6cbcd1..4bfe58049 100644 --- a/010_Intro/25_Tutorial_Indexing.asciidoc +++ b/010_Intro/25_Tutorial_Indexing.asciidoc @@ -3,28 +3,21 @@ To give you a feel for what is possible in Elasticsearch and how easy it is to use, let's start by walking through a simple tutorial that covers basic concepts such as indexing, search, and aggregations. -为了弄清楚 Elasticsearch 能实现什么以及实现的简易程度,让我们从一个简单的教程开始并介绍诸如索引、搜索以及聚合等基础概念。 We'll introduce some new terminology and basic concepts along the way, but it is OK if you don't understand everything immediately. We'll cover all the concepts introduced here in _much_ greater depth throughout the rest of the book. -我们将结合一些新的技术术语以及基础概念介绍,当然即使现在你无法全盘理解,这并不妨碍你着手学习 Elasticsearch。 -在本书后续内容中,我们将以更深入的角度去介绍以及剖析所有知识点。 So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of. -接下来,让我们快速入门 Elasticsearch。 ==== Let's Build an Employee Directory -==== 第一步:让我们创建一个雇员目录 We happen((("employee directory, building (example)"))) to work for _Megacorp_, and as part of HR's new _"We love our drones!"_ initiative, we have been tasked with creating an employee directory. The directory is supposed to foster employer empathy and real-time, synergistic, dynamic collaboration, so it has a few business requirements: -撰写此书时,我们正好受聘于 Megacorp 公司。同时作为 HR 部门一项新的激励项目“无人机精神”的部分内容,我们需要为此创建一个雇员目录。这项目录应当能支持团队实时协作、动态更新来提高团队沟通效能,因此,它拓展了一些业务性需求: - * Enable data to contain multi value tags, numbers, and full text. * Retrieve the full details of any employee. @@ -33,27 +26,18 @@ business requirements: * Return highlighted search _snippets_ from the text in the matching documents. * Enable management to build analytic dashboards over the data. -* 使数据包含多个标签、数值、以及全文链接 -* 检索任一员工的完整个人信息 -* 允许结构化搜索,诸如找到年纪在 30 岁以上的员工 -* 允许简单的全文搜索以及更复杂的短语搜索 -* 在匹配的文档内容中返回搜索的片断文本 -* 基于数据,创建并管理分析仪表盘 - === Indexing Employee Documents -=== 第二步:索引员工文档 The first order of business is storing employee data.((("documents", "indexing")))((("indexing"))) This will take the form of an _employee document_: a single document represents a single employee. The act of storing data in Elasticsearch is called _indexing_, but before we can index a document, we need to decide _where_ to store it. -首先的需求是以表格的方式(即:索引&文档)存储员工数据,其次每个表格可独立存储一个员工数据,在 Elasticsearch 存储数据的行为叫做索引(即 indexing),但是在索引前我们需要决定将文档存储在哪里。 -An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, +An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in +turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, and each document has((("fields"))) multiple _fields_. -一个 Elasticsearch 集群可以包含多重目录,由此可知集群的多样性特点。不同类型的集群管理不同的文档,每个文档有不同的 field。 .Index Versus Index Versus Index ************************************************** @@ -61,28 +45,21 @@ and each document has((("fields"))) multiple _fields_. You may already have noticed that the word _index_ is overloaded with several meanings in the context of Elasticsearch.((("index, meanings in Elasticsearch"))) A little clarification is necessary: -你也许已经注意到 index 这个词在 Elasticsearch 内容中包含多重意思。 -很显然,现在我们需要对 index 作精简的说明和注释。 Index (noun):: -索引(作名词用): As explained previously, an _index_ is like a _database_ in a traditional relational database. It is the place to store related documents. The plural of _index_ is _indices_ or _indexes_. -如上所述,一个 index 如同一个传统的关系数据库。在这里,你可以存储相关文档。index 的复数词为 indices 或 indexes。 Index (verb):: -编入索引(作动词用): _To index a document_ is to store a document in an _index (noun)_ so that it can be retrieved and queried. It is much like the `INSERT` keyword in SQL except that, if the document already exists, the new document would replace the old. -将一个文档编入索引中,即在一个索引(作名词用)中进行存储行为。因此,“编入索引”这个动作是可逆(retrieved)且可查询的(queried)。就像 SQL 中的 INSERT 关键字,如果文档已经存在,那么新的文档会覆盖旧的。 Inverted index:: -反向索引: Relational databases add an _index_, such as a B-tree index,((("relational databases", "indices"))) to specific columns in order to improve the speed of data retrieval. Elasticsearch and @@ -92,29 +69,19 @@ purpose. By default, every field in a document is _indexed_ (has an inverted index) and thus is searchable. A field without an inverted index is not searchable. We discuss inverted indexes in more detail in <>. -为了提高数据检索速度,我们往往会在相关的数据库中增加一个索引诸如二叉树索引结构来明确向量。类似地,Elasticsearch 以及 Lucene 使用一种名叫“反向索引”(inverted index)的结构来达到相同目的。 -默认地,文件中的每个 field 是可索引的(具备一个反向索引)且可被搜索。不具备反向索引的 field 无法被搜索。 -我们会在《反向索引》这一章节中讨论更多细节。 ************************************************** So for our employee directory, we are going to do the following: -关于我们的员工目录,我们将执行如下步骤: - -* Index a _document_ per employee, which contains all the details of a single - employee. -* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. +* Index a _document_ per employee, which contains all the details of a single + employee. +* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. * That type will live in the `megacorp` _index_. * That index will reside within our Elasticsearch cluster. -* 为每个员工创建文档并编入索引,其中包含了员工的所有信息; -* 每个文档的类型为 'employee'; -* 'employee' 类型包含在 'megacorp' 索引下; -* 'megacorp' 索引保存在 Elasticsearch 集群中。 In practice, this is easy (even though it looks like a lot of steps). We can perform all of those actions in a single command: -实际上这些步骤(尽管看起来繁琐)实践起来并不难,我们可以在一个单独指令中演示完毕。 [source,js] -------------------------------------------------- @@ -131,35 +98,26 @@ PUT /megacorp/employee/1 Notice that the path `/megacorp/employee/1` contains three pieces of information: -请注意,路径`/megacorp/employee/1`包含了三条信息: +megacorp+:: The index name -+megacorp+: 索引名字 +employee+:: The type name -+employee+: 类型名 +1+:: The ID of this particular employee -+1+: 每位员工的特定编码 The request body--the JSON document--contains all the information about this employee. His name is John Smith, he's 25, and enjoys rock climbing. -而作为请求的 JSON 文档,包含了这位员工的所有详细信息。他的名字叫约翰·史密斯,今年 25 岁,喜欢攀岩。 Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values. -很简单对不对?为此,我们没必要再执行管理规范性的事务,如创建索引或指定每个 field 所包含的数据类型。 -我们可以直接索引一个文档。 -所有规范性的任务 Elasticsearch 会默认匹配,因此繁琐的基础操作都将在后台完成。 Before moving on, let's add a few more employees to the directory: -进行下一步工作前,让我们在目录中增加更多员工信息: [source,js] -------------------------------------------------- diff --git a/010_Intro/30_Tutorial_Search.asciidoc b/010_Intro/30_Tutorial_Search.asciidoc index aba0fdceb..dfdd2781e 100644 --- a/010_Intro/30_Tutorial_Search.asciidoc +++ b/010_Intro/30_Tutorial_Search.asciidoc @@ -1,18 +1,12 @@ === Retrieving a Document -=== 检索文档 Now that we have some data stored in Elasticsearch,((("documents", "retrieving"))) we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data. -目前我们已经在 Elasticsearch 中存储了一些数据,接下来我们将在此基础上拓展业务性需求。 -第一个要求——检索单个员工的数据。 - This is easy in Elasticsearch. We simply execute((("HTTP requests", "retrieving a document with GET"))) an HTTP +GET+ request and specify the _address_ of the document--the index, type, and ID.((("id", "specifying in a request")))((("indices", "specifying index in a request")))((("types", "specifying type in a request"))) Using those three pieces of information, we can return the original JSON document: -这在 Elasticsearch 中很简单。让我们通过执行简单的 HTTP+GET 指令并明确文档地址(索引名、类型以及雇员编码) -使用这三条核心信息,检索操作将返回原始的 JSON 文档。 [source,js] -------------------------------------------------- @@ -22,7 +16,6 @@ GET /megacorp/employee/1 And the response contains some metadata about the document, and John Smith's original JSON document ((("_source field", sortas="source field")))as the `_source` field: -同时,response 包含了关于文档的一些元数据以及约翰·史密斯本人的原始 JSON 文档。 [source,js] -------------------------------------------------- @@ -43,26 +36,20 @@ original JSON document ((("_source field", sortas="source field")))as the `_sour -------------------------------------------------- [TIP] -提示: ==== In the same way that we changed ((("HTTP methods")))the HTTP verb from `PUT` to `GET` in order to retrieve the document, we could use the `DELETE` verb to delete the document, and the `HEAD` verb to check whether the document exists. To replace an existing document with an updated version, we just `PUT` it again. ==== -==== -在上述操作中,我们将 PUT 指令修改为 GET 来检索文档。同样地,我们可以使用 DELETE 指令来删除文档,HEAD 指令来检查文档是否存在。为了升级并替换已有文档,我们只需要使用 PUT 指令植入升级版本即可。 === Search Lite -=== 搜索简化版本 -A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search")))Let's try something a little more advanced, like a simple search! -一个 GET 指令相当简单,由此你可以获取想要检索的文档。 -接下来,让我们尝试更多高级的功能,比如一个简单的搜索。 +A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search"))) Let's +try something a little more advanced, like a simple search! The first search we will try is the simplest search possible. We will search for all employees, with this request: -初次搜索,我们将尽其可能地简单快速。通过以下指令,我们可以搜索所有员工信息: [source,js] -------------------------------------------------- @@ -74,8 +61,6 @@ You can see that we're still using index `megacorp` and type `employee`, but instead of specifying a document ID, we now use the `_search` endpoint. The response includes all three of our documents in the `hits` array. By default, a search will return the top 10 results. -你可以看到,我们还在使用索引指令 'megacorp' 以及类型名 'employee',但是与明确文档编码不同,我们现在使用 'search' 指令作为**地址终点符**。Response 在 'hits' 数组中包含了所有三个文档 -默认地,一个搜索可以返回头十位结果。 [source,js] -------------------------------------------------- @@ -134,14 +119,11 @@ a search will return the top 10 results. NOTE: The response not only tells us which documents matched, but also includes the whole document itself: all the information that we need in order to display the search results to the user. -注意:response 不仅仅能告知哪些文档匹配要求,同时包含了文档源文件的信息,以向用户展示最终搜索结果。 Next, let's try searching for employees who have ``Smith'' in their last name. To do this, we'll use a _lightweight_ search method that is easy to use from the command line. This method is often referred to as ((("query strings")))a _query-string_ search, since we pass the search as a URL query-string parameter: -下一步,让我们来尝试搜索哪些员工的姓氏为“Simth”。 -为了完成这一步操作,我们将使用 lightweight 搜索法,同时这在命令行中可以轻松获得。 [source,js] -------------------------------------------------- @@ -151,8 +133,6 @@ GET /megacorp/employee/_search?q=last_name:Smith We use the same `_search` endpoint in the path, and we add the query itself in the `q=` parameter. The results that come back show all Smiths: -我们同样在路径中使用 search 作为终点符,同时我们通过查询参数 'q=' 中添加标准 last_name:Smith。 -由此,检索结果返回所有与 Smith 相关的内容: [source,js] -------------------------------------------------- @@ -188,19 +168,14 @@ the `q=` parameter. The results that come back show all Smiths: -------------------------------------------------- === Search with Query DSL -=== 使用 Query DSL 来搜索 Query-string search is handy for ad hoc searches((("ad hoc searches"))) from the command line, but it has its limitations (see <>). Elasticsearch provides a rich, flexible, query language called the _query DSL_, which((("Query DSL"))) allows us to build much more complicated, robust queries. -Query-string 命令行对于点对点搜索来说是易于上手的,但是它也有自身的局限性(详情见《搜索简化》章节)。 -Elasticsearch 提供一个丰富的、流畅的 query 语言:query DSL。它使我们有能力创建更加复杂、可靠的查询体系。 The _domain-specific language_ (DSL) is((("DSL (Domain Specific Language)"))) specified using a JSON request body. We can represent the previous search for all Smiths like so: -DSL,作为一门特定领域语言,指定使用一个 JSON 请求。 -关于 Smith 作为姓氏条件进行搜索,我们可以如此展示: [source,js] @@ -221,16 +196,13 @@ number of things have changed. For one, we are no longer using _query-string_ parameters, but instead a request body. This request body is built with JSON, and uses a `match` query (one of several types of queries, which we will learn about later). -如先前查询方式,我们将获得相同结果。但是你可以看到,一些指令已经发生变化。举个例子,我们不再使用 query-string 参数,而由一个JSON请求作为替代方案。同时使用了 match 查询(属于查询类型之一,后续内容中我们将着重讲述)。 === More-Complicated Searches -=== 更复杂的搜索 Let's make the search a little more complicated.((("searches", "more complicated")))((("filters"))) We still want to find all employees with a last name of Smith, but we want only employees who are older than 30. Our query will change a little to accommodate a _filter_, which allows us to execute structured searches efficiently: -让我们试试更复杂的搜索吧:同样搜索出姓氏为 Smith 的员工,但是这次我们只需要其中年龄大于 30 的。我们的查询需要稍作调整来适应 filter 指令,由此我们将快速有效地执行结构化搜索。 [source,js] -------------------------------------------------- @@ -257,15 +229,12 @@ GET /megacorp/employee/_search <1> This portion of the query is the((("match queries"))) same `match` _query_ that we used before. <2> This portion of the query is a `range` _filter_, which((("range filters"))) will find all ages older than 30—`gt` stands for _greater than_. -<1> 这部分 'query' 与我们之前 'match' 指令中的 query 是一样的。 -<2> 这部分 'query' 是一个 'range' filter,它能帮助我们找到年龄大于 30 的对象。 Don't worry about the syntax too much for now; we will cover it in great detail later. Just recognize that we've added a _filter_ that performs a range search, and reused the same `match` query as before. Now our results show only one employee who happens to be 32 and is named Jane Smith: -当前我们不需要担心语法排列问题,之后我们将在细节部分做优化。我们只需要明确通过 filter 指令增加范围搜索功能,同时继续复用 match 查询指令。现在,我们的结果展示了恰巧只有一位员工符合条件,她叫 Jane Smith,今年 32 岁。 [source,js] -------------------------------------------------- @@ -291,16 +260,12 @@ only one employee who happens to be 32 and is named Jane Smith: -------------------------------------------------- === Full-Text Search -=== 全文本搜索 The searches so far have been simple: single names, filtered by age. Let's try a more advanced, full-text search--a ((("full text search")))task that traditional databases would really struggle with. -截止目前,搜索功能相对简单:单个姓名、年龄作筛选值。 -让我们尝试更复杂的全文本搜索,一项传统数据库难以搞定的任务。 We are going to search for all employees who enjoy rock climbing: -我们将搜索所有喜欢攀岩的员工信息: [source,js] -------------------------------------------------- @@ -317,7 +282,6 @@ GET /megacorp/employee/_search You can see that we use the same `match` query as before to search the `about` field for ``rock climbing''. We get back two matching documents: -你可以看到,我们依旧使用 match 查询指令来搜索 about 一栏内容,并用 "rock climbing" 作为 about 内容条件。由此,我们得到两项匹配的文档: [source,js] -------------------------------------------------- @@ -354,38 +318,31 @@ field for ``rock climbing''. We get back two matching documents: } -------------------------------------------------- <1> The relevance scores -<1> 相关性分值 By default, Elasticsearch sorts((("relevance scores"))) matching results by their relevance score, that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith's `about` field clearly says ``rock climbing'' in it. -默认地,Elasticsearch 通过相关性分值来分类匹配的结果,即依据每份文档匹配查询条件的程度作评估。显而易见地,相关性分值最高的结果属于 John Smith,因为他在 about 一栏中填写了 "rock climbing"。 But why did Jane Smith come back as a result? The reason her document was returned is because the word ``rock'' was mentioned in her `about` field. Because only ``rock'' was mentioned, and not ``climbing,'' her `_score` is lower than John's. -但是为什么 Jane Smith 也作为结果返回了呢?她的文档信息能够返回是因为 "rock" 这个字眼在她的 'about' 一栏中被提及到,但是因为只有 'rock' 而缺少了 'climbing' 字眼,因此她的相关性分值低于 John 的。 This is a good example of how Elasticsearch can search _within_ full-text fields and return the most relevant results first. This ((("relevance", "importance to Elasticsearch")))concept of _relevance_ is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn't. -这个案例很好地说明了 Elasticsearch 能够在全文本范围内进行搜索并优先返回相关性最强的结果。对于 Elasticsearch 来说,相关性(relevance)的概念非常重要,而这对于传统的关系数据库来说却是崭新的。 === Phrase Search -=== 词组搜索 Finding individual words in a field is all well and good, but sometimes you -want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could perform a query that will match only employee records that contain both ``rock'' +want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could +perform a query that will match only employee records that contain both ``rock'' _and_ ``climbing'' _and_ that display the words next to each other in the phrase ``rock climbing.'' -// -在一个 field 中找到特定的文本信息没有问题,但是有时候你想要搜索结果能够匹配特定序列的字眼或词组。举例说明,我们可以执行一段 query 指令来找出符合要求的员工信息:包含 "rock" 与 "climbing" 字眼且两个字眼紧挨着彼此。 To do this, we use a slight variation of the `match` query called the `match_phrase` query: -为了完成这一点,我们对 match 查询指令稍作调整,并称之为 match_phrase 查询指令: [source,js] -------------------------------------------------- @@ -401,7 +358,6 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/30_Query_DSL.json This, to no surprise, returns only John Smith's document: -看,毫无悬念,返回结果仅仅只有 John Smith 一个人的文档。 [source,js] -------------------------------------------------- @@ -429,15 +385,12 @@ This, to no surprise, returns only John Smith's document: [[highlighting-intro]] === Highlighting Our Searches -=== 突出我们的搜索 Many applications like to _highlight_ snippets((("searches", "highlighting search results")))((("highlighting searches"))) of text from each search result so the user can see _why_ the document matched the query. Retrieving highlighted fragments is easy in Elasticsearch. -许多应用喜欢在每个搜索结果中高亮片段信息,由此用户可以更清晰地知道为什么搜索结果符合查询条件。实际上,在 Elasticsearch 中检索高亮片段不难。 Let's rerun our previous query, but add a new `highlight` parameter: -我们只要重新运行之前的查询指令,但是需要增加一个新的 highlight 参数: [source,js] -------------------------------------------------- @@ -461,8 +414,6 @@ When we run this query, the same hit is returned as before, but now we get a new section in the response called `highlight`. This contains a snippet of text from the `about` field with the matching words wrapped in `` HTML tags: -当我们运行这段查询指令时,返回的采样数会保持不变,同时我们还将在返回值中获得一段名为 'highlight' 的片段。这个片段包含了 'about' 一栏内匹配的词组信息,并用 HTML 标签格式('')进行高亮包装。 - [source,js] -------------------------------------------------- @@ -494,8 +445,6 @@ HTML tags: -------------------------------------------------- <1> The highlighted fragment from the original text -<1> 源文本中高亮的片段信息 You can read more about the highlighting of search snippets in the {ref}/search-request-highlighting.html[highlighting reference documentation]. -关于搜索中的高亮片段,你可以在{ref}/search-request-highlighting.html[highlighting reference documentation].阅读更多信息。 diff --git a/010_Intro/35_Tutorial_Aggregations.asciidoc b/010_Intro/35_Tutorial_Aggregations.asciidoc index b58c46374..47429874c 100644 --- a/010_Intro/35_Tutorial_Aggregations.asciidoc +++ b/010_Intro/35_Tutorial_Aggregations.asciidoc @@ -1,13 +1,11 @@ === Analytics -=== 分析 Finally, we come to our last business requirement: allow managers to run analytics over the employee directory.((("analytics"))) Elasticsearch has functionality called -_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your data. It is similar to `GROUP BY` in SQL, but much more powerful. -最终,我们要讨论最后一个业务要求:允许管理者针对员工目录运行分析进程。Elasticsearch 有一项功能称之为 aggregations,通过 aggregations 系统进程可以基于你的数据产生精细的分析。与 SQL 中的 'GROUP BY' 功能类似,但是 aggregations 更加强大。 +_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your +data. It is similar to `GROUP BY` in SQL, but much more powerful. For example, let's find the most popular interests enjoyed by our employees: -举个例子,让我们挖掘出员工之间最受欢迎的兴趣爱好: [source,js] -------------------------------------------------- @@ -23,7 +21,6 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json Ignore the syntax for now and just look at the results: -忽略掉这些语法,让我们看看结果: [source,js] -------------------------------------------------- @@ -56,7 +53,6 @@ in sports. These aggregations are not precalculated; they are generated on the fly from the documents that match the current query. If we want to know the popular interests of people called Smith, we can just add the appropriate query into the mix: -我们可以看到,两位员工对音乐感兴趣,一位对林业感兴趣,一位对运动感兴趣。最终结果并不是预先统计的,而是文档信息与查询条件实时匹配后所收集的结果。如果我们想知道名为 Smith 的员工中最受欢迎的兴趣爱好,可以直接在混合参数中添加适当的查询条件: [source,js] -------------------------------------------------- @@ -79,7 +75,6 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json The `all_interests` aggregation has changed to include only documents matching our query: -是的,'all_interests' 集合条件做了调整以期只包含匹配我们查询条件的文档: [source,js] -------------------------------------------------- @@ -100,8 +95,6 @@ The `all_interests` aggregation has changed to include only documents matching o Aggregations allow hierarchical rollups too.((("aggregations", "hierarchical rollups in"))) For example, let's find the average age of employees who share a particular interest: -//?????? -举个例子,我们来查询拥有特定兴趣爱好的员工群体平均年龄: [source,js] -------------------------------------------------- @@ -123,7 +116,6 @@ GET /megacorp/employee/_search The aggregations that we get back are a bit more complicated, but still fairly easy to understand: -下面这段返回的集合值更复杂了些,不过理解起来还是相当简单的: [source,js] -------------------------------------------------- @@ -159,11 +151,6 @@ The output is basically an enriched version of the first aggregation we ran. We still have a list of interests and their counts, but now each interest has an additional `avg_age`, which shows the average age for all employees having that interest. -这段输出值基本上算是我们第一次获得的数值集合的升级版。 -在这个版本中,我们依旧获得了一串兴趣爱好清单以及相应的统计值,只不过每个兴趣都有了附加的 'avg_age' 值,代表这个兴趣爱好所覆盖员工的平均年龄。 Even if you don't understand the syntax yet, you can easily see how complex aggregations and groupings can be accomplished using this feature. The sky is the limit as to what kind of data you can extract! -即使你现在还不明白这些语法段,没关系,至少你可以快速了解到集合以及集合群通过 Elasticsearch 特性实现过程有多复杂。 -//你能提取的数据种类,就如天空一样,没有边际。 - diff --git a/010_Intro/40_Tutorial_Conclusion.asciidoc b/010_Intro/40_Tutorial_Conclusion.asciidoc index 5bb1fcbcf..a0e5394e4 100644 --- a/010_Intro/40_Tutorial_Conclusion.asciidoc +++ b/010_Intro/40_Tutorial_Conclusion.asciidoc @@ -1,17 +1,11 @@ === Tutorial Conclusion -=== 教程结语 Hopefully, this little tutorial was a good demonstration about what is possible in Elasticsearch. It is really just scratching the surface, and many features--such as suggestions, geolocation, percolation, fuzzy and partial matching--were omitted to keep the tutorial short. But it did highlight just how easy it is to start building advanced search functionality. No configuration was needed--just add data and start searching! -令人感到开心的是,这篇小教程对于解释 Elasticsearch 有哪些可能性,是一个不错的演示。 -目前我们仅仅是浅尝辄止,为了保持教程简洁明了,更多特性诸如 suggestions、geolocation,percolation,模糊与局部匹配被省略而过。 -但是它在证明如何简单地创建搜索功能方面作了突出成绩。不需要配置——只需要简单的添加数据。 - It's likely that the syntax left you confused in places, and you may have questions about how to tweak and tune various aspects. That's fine! The rest of the book dives into each of these issues in detail, giving you a solid understanding of how Elasticsearch works. -很可能你对各处的语法段有疑惑,同时关于如何调整语法段来获得多维度数据也有困惑,没关系!本书后续内容将细分成各个章节,以期帮助读者扎实了解 Elasticsearch 的工作原理。 diff --git a/010_Intro/45_Distributed.asciidoc b/010_Intro/45_Distributed.asciidoc index 9768be414..96f3ce298 100644 --- a/010_Intro/45_Distributed.asciidoc +++ b/010_Intro/45_Distributed.asciidoc @@ -1,14 +1,10 @@ === Distributed Nature -=== 分布式特性 At the beginning of this chapter, we said that Elasticsearch((("distributed nature of Elasticsearch"))) can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn't touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed. -在本章开头,我们已经提到过 Elasticsearch 既可以拓展到数百种(甚至数千种)服务器,也可以处理字节级数据。 -在如何使用 Elasticsearch 方面给出实际案例时,这份教程并不涉及任何复杂性工作。 -Elasticsearch 生来即分布式,同时在设计时已经隐藏了由分布式所带来的复杂特性。 The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster @@ -16,32 +12,24 @@ discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way. -Elasticsearch 的分布式特性很大程度上是透明的。教程不会要求你了解分布式系统、分片、集群算法或其他数十种分布式概念。 -它会在你的电脑中的每个节点上开心地跑程序,即使你要在一个包含 100 个节点的集群上运行程序,一切依旧顺畅。 Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood: -Elasticsearch 尽其所能来隐藏分布式系统的复杂性。这里我们列举了一些自动运行的操作: * Partitioning your documents into different containers((("documents", "partitioning into shards")))((("shards"))) or _shards_, which can be stored on a single node or on multiple nodes - * 在不同容器或分片中进行文档分区,同时操作可以储存在单一或多个节点中; * Balancing these shards across the nodes in your cluster to spread the indexing and search load - * 为拓展索引以及搜索负载量,在集群中跨节点平衡分片; * Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure - * 复制每个分片来保证数据多余拷贝,这样能防止因为硬件故障导致数据丢失; * Routing requests from any node in the cluster to the nodes that hold the data you're interested in - * 将集群中任一节点中的请求与存储有你感兴趣数据的节点联系在一起; * Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss - * 随着集群增长,无缝地整合新节点;自动重新分配分片来恢复丢失节点。 As you read through this book, you'll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about @@ -49,10 +37,9 @@ how the cluster scales and deals with failover (<>), handles document storage (<>), executes distributed search (<>), and what a shard is and how it works (<>). -当你阅读此教程时,你会遇到有关 Elasticsearch 分布式特性的补充章节。这些章节将教你有关集群规模、如何处理故障转移、处理文档存储、执行分布式搜索、什么是分区(shard)以及它的工作原理。 These chapters are not required reading--you can use Elasticsearch without understanding these internals--but they will provide insight that will make your knowledge of Elasticsearch more complete. Feel free to skim them and revisit at a later point when you need a more complete understanding. -这些章节不强制你阅读,你完全可以在没有理解这些内核知识的情况下使用 Elasticsearch。但是他们会帮助你在学习 Elasticsearch 时有更多的洞察力。跳过他们吧,不要有压力,当你需要更深入的理解时回过头来读也无妨。 + diff --git a/010_Intro/50_Conclusion.asciidoc b/010_Intro/50_Conclusion.asciidoc index db1d2db88..b386d3716 100644 --- a/010_Intro/50_Conclusion.asciidoc +++ b/010_Intro/50_Conclusion.asciidoc @@ -1,17 +1,13 @@ === Next Steps -=== 后续步骤 By now you should have a taste of what you can do with Elasticsearch, and how easy it is to get started. Elasticsearch tries hard to work out of the box with minimal knowledge and configuration. The best way to learn Elasticsearch is by jumping in: just start indexing and searching! -相信现在你已经对于能够通过 Elasticsearch 实现什么样的功能、以及操作的简易程度有了初步概念。Elasticsearch 努力通过最少的知识以及认证来达到开箱即用的效果。 -我相信,学习 Elasticsearch 最好的方式是参与:开始属于你的索引和搜索吧。 However, the more you know about Elasticsearch, the more productive you can become. The more you can tell Elasticsearch about the domain-specific elements of your application, the more you can fine-tune the output. -然而,对于 Elasticsearch 你知道得越多,你就更有生产力。对于你的应用你能掌握更多特定域元素,你就能更好地微调输出值。 The rest of this book will help you move from novice to expert. Each chapter explains the essentials, but also includes expert-level tips. If you're just getting started, these tips are probably not immediately relevant @@ -19,4 +15,3 @@ to you; Elasticsearch has sensible defaults and will generally do the right thing without any interference. You can always revisit these chapters later, when you are looking to improve performance by shaving off any wasted milliseconds. -本书的后续内容将帮助你从新手向专家转变,每个章节不仅阐述必要信息,而且包含专家级建议。如果你仍旧是新手水平,这些建议短时间内不一定适用;Elasticsearch 具备的默认值通常能保证在没有干扰下正确执行命令。当你寻求方式以提高程序质量来节约以毫秒计的浪费,到时回过头来重新阅读这些章节不迟。 diff --git a/130_Partial_Matching/05_Postcodes.asciidoc b/130_Partial_Matching/05_Postcodes.asciidoc index 18b27e22f..d3a47a907 100644 --- a/130_Partial_Matching/05_Postcodes.asciidoc +++ b/130_Partial_Matching/05_Postcodes.asciidoc @@ -1,19 +1,22 @@ -=== 邮编与结构化数据 +=== Postcodes and Structured Data -我们会使用美国目前使用的邮编形式(United Kingdom postcodes 标准)来说明如何用部分匹配查询结构化数据。((("partial matching", "postcodes and structured data")))这种邮编形式有很好的结构定义。例如,邮编 `W1V 3DG` 可以分解成如下形式:((("postcodes (UK), partial matching with"))) +We will use United Kingdom postcodes (postal codes in the United States) to illustrate how((("partial matching", "postcodes and structured data"))) to use partial matching with +structured data. UK postcodes have a well-defined structure. For instance, the +postcode `W1V 3DG` can((("postcodes (UK), partial matching with"))) be broken down as follows: -* `W1V` :这是邮编的外部,它定义了邮件的区域和行政区: +* `W1V`: This outer part identifies the postal area and district: -** `W` 代表区域( 1 或 2 个字母) -** `1V` 代表行政区( 1 或 2 个数字,可能跟着一个字符) +** `W` indicates the area (one or two letters) +** `1V` indicates the district (one or two numbers, possibly followed by a letter) -* `3DG` :内部定义了街道或建筑: +* `3DG`: This inner part identifies a street or building: -** `3` 代表街区区块( 1 个数字) -** `DG` 代表单元( 2 个字母) +** `3` indicates the sector (one number) +** `DG` indicates the unit (two letters) -假设将邮编作为 `not_analyzed` 的精确值字段索引,所以可以为其创建索引,如下: +Let's assume that we are indexing postcodes as exact-value `not_analyzed` +fields, so we could create our index as follows: [source,js] -------------------------------------------------- @@ -33,7 +36,7 @@ PUT /my_index -------------------------------------------------- // SENSE: 130_Partial_Matching/10_Prefix_query.json -然后索引一些邮编:((("indexing", "postcodes"))) +And index some ((("indexing", "postcodes")))postcodes: [source,js] -------------------------------------------------- @@ -54,4 +57,4 @@ PUT /my_index/address/5 -------------------------------------------------- // SENSE: 130_Partial_Matching/10_Prefix_query.json -现在这些数据已可查询。 +Now our data is ready to be queried. diff --git a/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc b/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc index 8dc1ac471..2a32ef3a4 100644 --- a/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc +++ b/270_Fuzzy_matching/40_Fuzzy_match_query.asciidoc @@ -1,7 +1,7 @@ [[fuzzy-match-query]] -=== 模糊匹配查询 +=== Fuzzy match Query -`match` 查询支持((("typoes and misspellings", "fuzzy match query")))((("match query", "fuzzy matching")))((("fuzzy matching", "match query")))开箱即用的模糊匹配: +The `match` query supports ((("typoes and misspellings", "fuzzy match query")))((("match query", "fuzzy matching")))((("fuzzy matching", "match query")))fuzzy matching out of the box: [source,json] ----------------------------------- @@ -19,9 +19,11 @@ GET /my_index/my_type/_search } ----------------------------------- -查询字符串首先进行分析,会产生词项 `[surprize, me]` ,并且每个词项根据指定的 `fuzziness` 进行模糊化。 +The query string is first analyzed, to produce the terms `[surprize, me]`, and +then each term is fuzzified using the specified `fuzziness`. -同样, `multi_match` 查询也((("multi_match queries", "fuzziness support")))支持 `fuzziness` ,但只有当执行查询时类型是 `best_fields` 或者 `most_fields` : +Similarly, the `multi_match` query also ((("multi_match queries", "fuzziness support")))supports `fuzziness`, but only when +executing with type `best_fields` or `most_fields`: [source,json] ----------------------------------- @@ -37,6 +39,9 @@ GET /my_index/my_type/_search } ----------------------------------- -`match` 和 `multi_match` 查询都支持 `prefix_length` 和 `max_expansions` 参数。 +Both the `match` and `multi_match` queries also support the `prefix_length` +and `max_expansions` parameters. + +TIP: Fuzziness works only with the basic `match` and `multi_match` queries. It +doesn't work with phrase matching, common terms, or `cross_fields` matches. -TIP: 模糊性(Fuzziness)只能在 `match` and `multi_match` 查询中使用。不能使用在短语匹配、常用词项或 `cross_fields` 匹配。 diff --git a/400_Relationships/20_Denormalization.asciidoc b/400_Relationships/20_Denormalization.asciidoc index 2a39c4e91..9b72605f5 100644 --- a/400_Relationships/20_Denormalization.asciidoc +++ b/400_Relationships/20_Denormalization.asciidoc @@ -1,11 +1,15 @@ [[denormalization]] -=== 非规范化你的数据 +=== Denormalizing Your Data +The way to get the best search performance out of Elasticsearch is to use it +as it is intended, by((("relationships", "denormalizing your data")))((("denormalization", "denormalizing data at index time"))) +http://en.wikipedia.org/wiki/Denormalization[denormalizing] your data at index +time. Having redundant copies of data in each document that requires access to +it removes the need for joins. -使用 Elasticsearch 得到最好的搜索性能的方法是有目的的通过在索引时进行非规范化 ((("relationships", "denormalizing your data")))((("denormalization", "denormalizing data at index time"))) -http://en.wikipedia.org/wiki/Denormalization[denormalizing]。对每个文档保持一定数量的冗余副本可以在需要访问时避免进行关联。 +If we want to be able to find a blog post by the name of the user who wrote it, +include the user's name in the blog-post document itself: -如果我们希望能够通过某个用户姓名找到他写的博客文章,可以在博客文档中包含这个用户的姓名: [source,json] -------------------------------- @@ -26,9 +30,10 @@ PUT /my_index/blogpost/2 } } -------------------------------- -<1> 这部分用户的字段数据已被冗余到 `blogpost` 文档中。 +<1> Part of the user's data has been denormalized into the `blogpost` document. -现在,我们通过单次查询就能够通过 `relationships` 找到用户 `John` 的博客文章。 +Now, we can find blog posts about `relationships` by users called `John` +with a single query: [source,json] -------------------------------- @@ -45,4 +50,7 @@ GET /my_index/blogpost/_search } -------------------------------- -数据非规范化的优点是速度快。因为每个文档都包含了所需的所有信息,当这些信息需要在查询进行匹配时,并不需要进行昂贵的联接操作。 +The advantage of data denormalization is speed. Because each document +contains all of the information that is required to determine whether it +matches the query, there is no need for expensive joins. + From c26d1ea56310fd2bf20f6d3b78d94f62fcde1b6a Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:42:46 +0800 Subject: [PATCH 3/8] chapter1:/010_intro/25_Tutorial_Indexing.asciidoc --- 010_Intro/25_Tutorial_Indexing.asciidoc | 52 ++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 5 deletions(-) diff --git a/010_Intro/25_Tutorial_Indexing.asciidoc b/010_Intro/25_Tutorial_Indexing.asciidoc index 4bfe58049..0ac6cbcd1 100644 --- a/010_Intro/25_Tutorial_Indexing.asciidoc +++ b/010_Intro/25_Tutorial_Indexing.asciidoc @@ -3,21 +3,28 @@ To give you a feel for what is possible in Elasticsearch and how easy it is to use, let's start by walking through a simple tutorial that covers basic concepts such as indexing, search, and aggregations. +为了弄清楚 Elasticsearch 能实现什么以及实现的简易程度,让我们从一个简单的教程开始并介绍诸如索引、搜索以及聚合等基础概念。 We'll introduce some new terminology and basic concepts along the way, but it is OK if you don't understand everything immediately. We'll cover all the concepts introduced here in _much_ greater depth throughout the rest of the book. +我们将结合一些新的技术术语以及基础概念介绍,当然即使现在你无法全盘理解,这并不妨碍你着手学习 Elasticsearch。 +在本书后续内容中,我们将以更深入的角度去介绍以及剖析所有知识点。 So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of. +接下来,让我们快速入门 Elasticsearch。 ==== Let's Build an Employee Directory +==== 第一步:让我们创建一个雇员目录 We happen((("employee directory, building (example)"))) to work for _Megacorp_, and as part of HR's new _"We love our drones!"_ initiative, we have been tasked with creating an employee directory. The directory is supposed to foster employer empathy and real-time, synergistic, dynamic collaboration, so it has a few business requirements: +撰写此书时,我们正好受聘于 Megacorp 公司。同时作为 HR 部门一项新的激励项目“无人机精神”的部分内容,我们需要为此创建一个雇员目录。这项目录应当能支持团队实时协作、动态更新来提高团队沟通效能,因此,它拓展了一些业务性需求: + * Enable data to contain multi value tags, numbers, and full text. * Retrieve the full details of any employee. @@ -26,18 +33,27 @@ business requirements: * Return highlighted search _snippets_ from the text in the matching documents. * Enable management to build analytic dashboards over the data. +* 使数据包含多个标签、数值、以及全文链接 +* 检索任一员工的完整个人信息 +* 允许结构化搜索,诸如找到年纪在 30 岁以上的员工 +* 允许简单的全文搜索以及更复杂的短语搜索 +* 在匹配的文档内容中返回搜索的片断文本 +* 基于数据,创建并管理分析仪表盘 + === Indexing Employee Documents +=== 第二步:索引员工文档 The first order of business is storing employee data.((("documents", "indexing")))((("indexing"))) This will take the form of an _employee document_: a single document represents a single employee. The act of storing data in Elasticsearch is called _indexing_, but before we can index a document, we need to decide _where_ to store it. +首先的需求是以表格的方式(即:索引&文档)存储员工数据,其次每个表格可独立存储一个员工数据,在 Elasticsearch 存储数据的行为叫做索引(即 indexing),但是在索引前我们需要决定将文档存储在哪里。 -An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in -turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, +An Elasticsearch cluster can((("clusters", "indices in")))(((in clusters"))) contain multiple _indices_, which in turn contain multiple _types_.((("tables"))) These types hold multiple _documents_, and each document has((("fields"))) multiple _fields_. +一个 Elasticsearch 集群可以包含多重目录,由此可知集群的多样性特点。不同类型的集群管理不同的文档,每个文档有不同的 field。 .Index Versus Index Versus Index ************************************************** @@ -45,21 +61,28 @@ and each document has((("fields"))) multiple _fields_. You may already have noticed that the word _index_ is overloaded with several meanings in the context of Elasticsearch.((("index, meanings in Elasticsearch"))) A little clarification is necessary: +你也许已经注意到 index 这个词在 Elasticsearch 内容中包含多重意思。 +很显然,现在我们需要对 index 作精简的说明和注释。 Index (noun):: +索引(作名词用): As explained previously, an _index_ is like a _database_ in a traditional relational database. It is the place to store related documents. The plural of _index_ is _indices_ or _indexes_. +如上所述,一个 index 如同一个传统的关系数据库。在这里,你可以存储相关文档。index 的复数词为 indices 或 indexes。 Index (verb):: +编入索引(作动词用): _To index a document_ is to store a document in an _index (noun)_ so that it can be retrieved and queried. It is much like the `INSERT` keyword in SQL except that, if the document already exists, the new document would replace the old. +将一个文档编入索引中,即在一个索引(作名词用)中进行存储行为。因此,“编入索引”这个动作是可逆(retrieved)且可查询的(queried)。就像 SQL 中的 INSERT 关键字,如果文档已经存在,那么新的文档会覆盖旧的。 Inverted index:: +反向索引: Relational databases add an _index_, such as a B-tree index,((("relational databases", "indices"))) to specific columns in order to improve the speed of data retrieval. Elasticsearch and @@ -69,19 +92,29 @@ purpose. By default, every field in a document is _indexed_ (has an inverted index) and thus is searchable. A field without an inverted index is not searchable. We discuss inverted indexes in more detail in <>. +为了提高数据检索速度,我们往往会在相关的数据库中增加一个索引诸如二叉树索引结构来明确向量。类似地,Elasticsearch 以及 Lucene 使用一种名叫“反向索引”(inverted index)的结构来达到相同目的。 +默认地,文件中的每个 field 是可索引的(具备一个反向索引)且可被搜索。不具备反向索引的 field 无法被搜索。 +我们会在《反向索引》这一章节中讨论更多细节。 ************************************************** So for our employee directory, we are going to do the following: +关于我们的员工目录,我们将执行如下步骤: + -* Index a _document_ per employee, which contains all the details of a single - employee. -* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. +* Index a _document_ per employee, which contains all the details of a single + employee. +* Each document will be((("types", "in employee directory (example)"))) of _type_ `employee`. * That type will live in the `megacorp` _index_. * That index will reside within our Elasticsearch cluster. +* 为每个员工创建文档并编入索引,其中包含了员工的所有信息; +* 每个文档的类型为 'employee'; +* 'employee' 类型包含在 'megacorp' 索引下; +* 'megacorp' 索引保存在 Elasticsearch 集群中。 In practice, this is easy (even though it looks like a lot of steps). We can perform all of those actions in a single command: +实际上这些步骤(尽管看起来繁琐)实践起来并不难,我们可以在一个单独指令中演示完毕。 [source,js] -------------------------------------------------- @@ -98,26 +131,35 @@ PUT /megacorp/employee/1 Notice that the path `/megacorp/employee/1` contains three pieces of information: +请注意,路径`/megacorp/employee/1`包含了三条信息: +megacorp+:: The index name ++megacorp+: 索引名字 +employee+:: The type name ++employee+: 类型名 +1+:: The ID of this particular employee ++1+: 每位员工的特定编码 The request body--the JSON document--contains all the information about this employee. His name is John Smith, he's 25, and enjoys rock climbing. +而作为请求的 JSON 文档,包含了这位员工的所有详细信息。他的名字叫约翰·史密斯,今年 25 岁,喜欢攀岩。 Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values. +很简单对不对?为此,我们没必要再执行管理规范性的事务,如创建索引或指定每个 field 所包含的数据类型。 +我们可以直接索引一个文档。 +所有规范性的任务 Elasticsearch 会默认匹配,因此繁琐的基础操作都将在后台完成。 Before moving on, let's add a few more employees to the directory: +进行下一步工作前,让我们在目录中增加更多员工信息: [source,js] -------------------------------------------------- From dd370c469d22af2091fcca06f59bf7b97247dbad Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:44:17 +0800 Subject: [PATCH 4/8] chapter1:/010_intro/30_Tutorial_Search.asciidoc --- 010_Intro/30_Tutorial_Search.asciidoc | 59 +++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 4 deletions(-) diff --git a/010_Intro/30_Tutorial_Search.asciidoc b/010_Intro/30_Tutorial_Search.asciidoc index dfdd2781e..aba0fdceb 100644 --- a/010_Intro/30_Tutorial_Search.asciidoc +++ b/010_Intro/30_Tutorial_Search.asciidoc @@ -1,12 +1,18 @@ === Retrieving a Document +=== 检索文档 Now that we have some data stored in Elasticsearch,((("documents", "retrieving"))) we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data. +目前我们已经在 Elasticsearch 中存储了一些数据,接下来我们将在此基础上拓展业务性需求。 +第一个要求——检索单个员工的数据。 + This is easy in Elasticsearch. We simply execute((("HTTP requests", "retrieving a document with GET"))) an HTTP +GET+ request and specify the _address_ of the document--the index, type, and ID.((("id", "specifying in a request")))((("indices", "specifying index in a request")))((("types", "specifying type in a request"))) Using those three pieces of information, we can return the original JSON document: +这在 Elasticsearch 中很简单。让我们通过执行简单的 HTTP+GET 指令并明确文档地址(索引名、类型以及雇员编码) +使用这三条核心信息,检索操作将返回原始的 JSON 文档。 [source,js] -------------------------------------------------- @@ -16,6 +22,7 @@ GET /megacorp/employee/1 And the response contains some metadata about the document, and John Smith's original JSON document ((("_source field", sortas="source field")))as the `_source` field: +同时,response 包含了关于文档的一些元数据以及约翰·史密斯本人的原始 JSON 文档。 [source,js] -------------------------------------------------- @@ -36,20 +43,26 @@ original JSON document ((("_source field", sortas="source field")))as the `_sour -------------------------------------------------- [TIP] +提示: ==== In the same way that we changed ((("HTTP methods")))the HTTP verb from `PUT` to `GET` in order to retrieve the document, we could use the `DELETE` verb to delete the document, and the `HEAD` verb to check whether the document exists. To replace an existing document with an updated version, we just `PUT` it again. ==== +==== +在上述操作中,我们将 PUT 指令修改为 GET 来检索文档。同样地,我们可以使用 DELETE 指令来删除文档,HEAD 指令来检查文档是否存在。为了升级并替换已有文档,我们只需要使用 PUT 指令植入升级版本即可。 === Search Lite +=== 搜索简化版本 -A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search"))) Let's -try something a little more advanced, like a simple search! +A `GET` is fairly simple--you get back the document that you ask for.((("GET method")))((("searches", "simple search")))Let's try something a little more advanced, like a simple search! +一个 GET 指令相当简单,由此你可以获取想要检索的文档。 +接下来,让我们尝试更多高级的功能,比如一个简单的搜索。 The first search we will try is the simplest search possible. We will search for all employees, with this request: +初次搜索,我们将尽其可能地简单快速。通过以下指令,我们可以搜索所有员工信息: [source,js] -------------------------------------------------- @@ -61,6 +74,8 @@ You can see that we're still using index `megacorp` and type `employee`, but instead of specifying a document ID, we now use the `_search` endpoint. The response includes all three of our documents in the `hits` array. By default, a search will return the top 10 results. +你可以看到,我们还在使用索引指令 'megacorp' 以及类型名 'employee',但是与明确文档编码不同,我们现在使用 'search' 指令作为**地址终点符**。Response 在 'hits' 数组中包含了所有三个文档 +默认地,一个搜索可以返回头十位结果。 [source,js] -------------------------------------------------- @@ -119,11 +134,14 @@ a search will return the top 10 results. NOTE: The response not only tells us which documents matched, but also includes the whole document itself: all the information that we need in order to display the search results to the user. +注意:response 不仅仅能告知哪些文档匹配要求,同时包含了文档源文件的信息,以向用户展示最终搜索结果。 Next, let's try searching for employees who have ``Smith'' in their last name. To do this, we'll use a _lightweight_ search method that is easy to use from the command line. This method is often referred to as ((("query strings")))a _query-string_ search, since we pass the search as a URL query-string parameter: +下一步,让我们来尝试搜索哪些员工的姓氏为“Simth”。 +为了完成这一步操作,我们将使用 lightweight 搜索法,同时这在命令行中可以轻松获得。 [source,js] -------------------------------------------------- @@ -133,6 +151,8 @@ GET /megacorp/employee/_search?q=last_name:Smith We use the same `_search` endpoint in the path, and we add the query itself in the `q=` parameter. The results that come back show all Smiths: +我们同样在路径中使用 search 作为终点符,同时我们通过查询参数 'q=' 中添加标准 last_name:Smith。 +由此,检索结果返回所有与 Smith 相关的内容: [source,js] -------------------------------------------------- @@ -168,14 +188,19 @@ the `q=` parameter. The results that come back show all Smiths: -------------------------------------------------- === Search with Query DSL +=== 使用 Query DSL 来搜索 Query-string search is handy for ad hoc searches((("ad hoc searches"))) from the command line, but it has its limitations (see <>). Elasticsearch provides a rich, flexible, query language called the _query DSL_, which((("Query DSL"))) allows us to build much more complicated, robust queries. +Query-string 命令行对于点对点搜索来说是易于上手的,但是它也有自身的局限性(详情见《搜索简化》章节)。 +Elasticsearch 提供一个丰富的、流畅的 query 语言:query DSL。它使我们有能力创建更加复杂、可靠的查询体系。 The _domain-specific language_ (DSL) is((("DSL (Domain Specific Language)"))) specified using a JSON request body. We can represent the previous search for all Smiths like so: +DSL,作为一门特定领域语言,指定使用一个 JSON 请求。 +关于 Smith 作为姓氏条件进行搜索,我们可以如此展示: [source,js] @@ -196,13 +221,16 @@ number of things have changed. For one, we are no longer using _query-string_ parameters, but instead a request body. This request body is built with JSON, and uses a `match` query (one of several types of queries, which we will learn about later). +如先前查询方式,我们将获得相同结果。但是你可以看到,一些指令已经发生变化。举个例子,我们不再使用 query-string 参数,而由一个JSON请求作为替代方案。同时使用了 match 查询(属于查询类型之一,后续内容中我们将着重讲述)。 === More-Complicated Searches +=== 更复杂的搜索 Let's make the search a little more complicated.((("searches", "more complicated")))((("filters"))) We still want to find all employees with a last name of Smith, but we want only employees who are older than 30. Our query will change a little to accommodate a _filter_, which allows us to execute structured searches efficiently: +让我们试试更复杂的搜索吧:同样搜索出姓氏为 Smith 的员工,但是这次我们只需要其中年龄大于 30 的。我们的查询需要稍作调整来适应 filter 指令,由此我们将快速有效地执行结构化搜索。 [source,js] -------------------------------------------------- @@ -229,12 +257,15 @@ GET /megacorp/employee/_search <1> This portion of the query is the((("match queries"))) same `match` _query_ that we used before. <2> This portion of the query is a `range` _filter_, which((("range filters"))) will find all ages older than 30—`gt` stands for _greater than_. +<1> 这部分 'query' 与我们之前 'match' 指令中的 query 是一样的。 +<2> 这部分 'query' 是一个 'range' filter,它能帮助我们找到年龄大于 30 的对象。 Don't worry about the syntax too much for now; we will cover it in great detail later. Just recognize that we've added a _filter_ that performs a range search, and reused the same `match` query as before. Now our results show only one employee who happens to be 32 and is named Jane Smith: +当前我们不需要担心语法排列问题,之后我们将在细节部分做优化。我们只需要明确通过 filter 指令增加范围搜索功能,同时继续复用 match 查询指令。现在,我们的结果展示了恰巧只有一位员工符合条件,她叫 Jane Smith,今年 32 岁。 [source,js] -------------------------------------------------- @@ -260,12 +291,16 @@ only one employee who happens to be 32 and is named Jane Smith: -------------------------------------------------- === Full-Text Search +=== 全文本搜索 The searches so far have been simple: single names, filtered by age. Let's try a more advanced, full-text search--a ((("full text search")))task that traditional databases would really struggle with. +截止目前,搜索功能相对简单:单个姓名、年龄作筛选值。 +让我们尝试更复杂的全文本搜索,一项传统数据库难以搞定的任务。 We are going to search for all employees who enjoy rock climbing: +我们将搜索所有喜欢攀岩的员工信息: [source,js] -------------------------------------------------- @@ -282,6 +317,7 @@ GET /megacorp/employee/_search You can see that we use the same `match` query as before to search the `about` field for ``rock climbing''. We get back two matching documents: +你可以看到,我们依旧使用 match 查询指令来搜索 about 一栏内容,并用 "rock climbing" 作为 about 内容条件。由此,我们得到两项匹配的文档: [source,js] -------------------------------------------------- @@ -318,31 +354,38 @@ field for ``rock climbing''. We get back two matching documents: } -------------------------------------------------- <1> The relevance scores +<1> 相关性分值 By default, Elasticsearch sorts((("relevance scores"))) matching results by their relevance score, that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith's `about` field clearly says ``rock climbing'' in it. +默认地,Elasticsearch 通过相关性分值来分类匹配的结果,即依据每份文档匹配查询条件的程度作评估。显而易见地,相关性分值最高的结果属于 John Smith,因为他在 about 一栏中填写了 "rock climbing"。 But why did Jane Smith come back as a result? The reason her document was returned is because the word ``rock'' was mentioned in her `about` field. Because only ``rock'' was mentioned, and not ``climbing,'' her `_score` is lower than John's. +但是为什么 Jane Smith 也作为结果返回了呢?她的文档信息能够返回是因为 "rock" 这个字眼在她的 'about' 一栏中被提及到,但是因为只有 'rock' 而缺少了 'climbing' 字眼,因此她的相关性分值低于 John 的。 This is a good example of how Elasticsearch can search _within_ full-text fields and return the most relevant results first. This ((("relevance", "importance to Elasticsearch")))concept of _relevance_ is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn't. +这个案例很好地说明了 Elasticsearch 能够在全文本范围内进行搜索并优先返回相关性最强的结果。对于 Elasticsearch 来说,相关性(relevance)的概念非常重要,而这对于传统的关系数据库来说却是崭新的。 === Phrase Search +=== 词组搜索 Finding individual words in a field is all well and good, but sometimes you -want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could -perform a query that will match only employee records that contain both ``rock'' +want to match exact sequences of words or _phrases_.((("phrase matching"))) For instance, we could perform a query that will match only employee records that contain both ``rock'' _and_ ``climbing'' _and_ that display the words next to each other in the phrase ``rock climbing.'' +// +在一个 field 中找到特定的文本信息没有问题,但是有时候你想要搜索结果能够匹配特定序列的字眼或词组。举例说明,我们可以执行一段 query 指令来找出符合要求的员工信息:包含 "rock" 与 "climbing" 字眼且两个字眼紧挨着彼此。 To do this, we use a slight variation of the `match` query called the `match_phrase` query: +为了完成这一点,我们对 match 查询指令稍作调整,并称之为 match_phrase 查询指令: [source,js] -------------------------------------------------- @@ -358,6 +401,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/30_Query_DSL.json This, to no surprise, returns only John Smith's document: +看,毫无悬念,返回结果仅仅只有 John Smith 一个人的文档。 [source,js] -------------------------------------------------- @@ -385,12 +429,15 @@ This, to no surprise, returns only John Smith's document: [[highlighting-intro]] === Highlighting Our Searches +=== 突出我们的搜索 Many applications like to _highlight_ snippets((("searches", "highlighting search results")))((("highlighting searches"))) of text from each search result so the user can see _why_ the document matched the query. Retrieving highlighted fragments is easy in Elasticsearch. +许多应用喜欢在每个搜索结果中高亮片段信息,由此用户可以更清晰地知道为什么搜索结果符合查询条件。实际上,在 Elasticsearch 中检索高亮片段不难。 Let's rerun our previous query, but add a new `highlight` parameter: +我们只要重新运行之前的查询指令,但是需要增加一个新的 highlight 参数: [source,js] -------------------------------------------------- @@ -414,6 +461,8 @@ When we run this query, the same hit is returned as before, but now we get a new section in the response called `highlight`. This contains a snippet of text from the `about` field with the matching words wrapped in `` HTML tags: +当我们运行这段查询指令时,返回的采样数会保持不变,同时我们还将在返回值中获得一段名为 'highlight' 的片段。这个片段包含了 'about' 一栏内匹配的词组信息,并用 HTML 标签格式('')进行高亮包装。 + [source,js] -------------------------------------------------- @@ -445,6 +494,8 @@ HTML tags: -------------------------------------------------- <1> The highlighted fragment from the original text +<1> 源文本中高亮的片段信息 You can read more about the highlighting of search snippets in the {ref}/search-request-highlighting.html[highlighting reference documentation]. +关于搜索中的高亮片段,你可以在{ref}/search-request-highlighting.html[highlighting reference documentation].阅读更多信息。 From 498677c33e016e7468d2126e463c9f2abed6df2a Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:45:06 +0800 Subject: [PATCH 5/8] chapter1:/010_intro/35_Tutorial_Aggregations.asciidoc --- 010_Intro/35_Tutorial_Aggregations.asciidoc | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/010_Intro/35_Tutorial_Aggregations.asciidoc b/010_Intro/35_Tutorial_Aggregations.asciidoc index 47429874c..b58c46374 100644 --- a/010_Intro/35_Tutorial_Aggregations.asciidoc +++ b/010_Intro/35_Tutorial_Aggregations.asciidoc @@ -1,11 +1,13 @@ === Analytics +=== 分析 Finally, we come to our last business requirement: allow managers to run analytics over the employee directory.((("analytics"))) Elasticsearch has functionality called -_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your -data. It is similar to `GROUP BY` in SQL, but much more powerful. +_aggregations_, which ((("aggregations")))allow you to generate sophisticated analytics over your data. It is similar to `GROUP BY` in SQL, but much more powerful. +最终,我们要讨论最后一个业务要求:允许管理者针对员工目录运行分析进程。Elasticsearch 有一项功能称之为 aggregations,通过 aggregations 系统进程可以基于你的数据产生精细的分析。与 SQL 中的 'GROUP BY' 功能类似,但是 aggregations 更加强大。 For example, let's find the most popular interests enjoyed by our employees: +举个例子,让我们挖掘出员工之间最受欢迎的兴趣爱好: [source,js] -------------------------------------------------- @@ -21,6 +23,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json Ignore the syntax for now and just look at the results: +忽略掉这些语法,让我们看看结果: [source,js] -------------------------------------------------- @@ -53,6 +56,7 @@ in sports. These aggregations are not precalculated; they are generated on the fly from the documents that match the current query. If we want to know the popular interests of people called Smith, we can just add the appropriate query into the mix: +我们可以看到,两位员工对音乐感兴趣,一位对林业感兴趣,一位对运动感兴趣。最终结果并不是预先统计的,而是文档信息与查询条件实时匹配后所收集的结果。如果我们想知道名为 Smith 的员工中最受欢迎的兴趣爱好,可以直接在混合参数中添加适当的查询条件: [source,js] -------------------------------------------------- @@ -75,6 +79,7 @@ GET /megacorp/employee/_search // SENSE: 010_Intro/35_Aggregations.json The `all_interests` aggregation has changed to include only documents matching our query: +是的,'all_interests' 集合条件做了调整以期只包含匹配我们查询条件的文档: [source,js] -------------------------------------------------- @@ -95,6 +100,8 @@ The `all_interests` aggregation has changed to include only documents matching o Aggregations allow hierarchical rollups too.((("aggregations", "hierarchical rollups in"))) For example, let's find the average age of employees who share a particular interest: +//?????? +举个例子,我们来查询拥有特定兴趣爱好的员工群体平均年龄: [source,js] -------------------------------------------------- @@ -116,6 +123,7 @@ GET /megacorp/employee/_search The aggregations that we get back are a bit more complicated, but still fairly easy to understand: +下面这段返回的集合值更复杂了些,不过理解起来还是相当简单的: [source,js] -------------------------------------------------- @@ -151,6 +159,11 @@ The output is basically an enriched version of the first aggregation we ran. We still have a list of interests and their counts, but now each interest has an additional `avg_age`, which shows the average age for all employees having that interest. +这段输出值基本上算是我们第一次获得的数值集合的升级版。 +在这个版本中,我们依旧获得了一串兴趣爱好清单以及相应的统计值,只不过每个兴趣都有了附加的 'avg_age' 值,代表这个兴趣爱好所覆盖员工的平均年龄。 Even if you don't understand the syntax yet, you can easily see how complex aggregations and groupings can be accomplished using this feature. The sky is the limit as to what kind of data you can extract! +即使你现在还不明白这些语法段,没关系,至少你可以快速了解到集合以及集合群通过 Elasticsearch 特性实现过程有多复杂。 +//你能提取的数据种类,就如天空一样,没有边际。 + From 67c9f9055e09c67280fe95f643a27a1e47b8c606 Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:45:22 +0800 Subject: [PATCH 6/8] chapter1:/010_intro/ 40_Tutorial_Conclusion.asciidoc --- 010_Intro/40_Tutorial_Conclusion.asciidoc | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/010_Intro/40_Tutorial_Conclusion.asciidoc b/010_Intro/40_Tutorial_Conclusion.asciidoc index a0e5394e4..5bb1fcbcf 100644 --- a/010_Intro/40_Tutorial_Conclusion.asciidoc +++ b/010_Intro/40_Tutorial_Conclusion.asciidoc @@ -1,11 +1,17 @@ === Tutorial Conclusion +=== 教程结语 Hopefully, this little tutorial was a good demonstration about what is possible in Elasticsearch. It is really just scratching the surface, and many features--such as suggestions, geolocation, percolation, fuzzy and partial matching--were omitted to keep the tutorial short. But it did highlight just how easy it is to start building advanced search functionality. No configuration was needed--just add data and start searching! +令人感到开心的是,这篇小教程对于解释 Elasticsearch 有哪些可能性,是一个不错的演示。 +目前我们仅仅是浅尝辄止,为了保持教程简洁明了,更多特性诸如 suggestions、geolocation,percolation,模糊与局部匹配被省略而过。 +但是它在证明如何简单地创建搜索功能方面作了突出成绩。不需要配置——只需要简单的添加数据。 + It's likely that the syntax left you confused in places, and you may have questions about how to tweak and tune various aspects. That's fine! The rest of the book dives into each of these issues in detail, giving you a solid understanding of how Elasticsearch works. +很可能你对各处的语法段有疑惑,同时关于如何调整语法段来获得多维度数据也有困惑,没关系!本书后续内容将细分成各个章节,以期帮助读者扎实了解 Elasticsearch 的工作原理。 From 7399d163d944a3e9f3b9369db0fdfbdf1a9152b1 Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:45:35 +0800 Subject: [PATCH 7/8] chapter1:/010_intro/45_Distributed.asciidoc --- 010_Intro/45_Distributed.asciidoc | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/010_Intro/45_Distributed.asciidoc b/010_Intro/45_Distributed.asciidoc index 96f3ce298..9768be414 100644 --- a/010_Intro/45_Distributed.asciidoc +++ b/010_Intro/45_Distributed.asciidoc @@ -1,10 +1,14 @@ === Distributed Nature +=== 分布式特性 At the beginning of this chapter, we said that Elasticsearch((("distributed nature of Elasticsearch"))) can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn't touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed. +在本章开头,我们已经提到过 Elasticsearch 既可以拓展到数百种(甚至数千种)服务器,也可以处理字节级数据。 +在如何使用 Elasticsearch 方面给出实际案例时,这份教程并不涉及任何复杂性工作。 +Elasticsearch 生来即分布式,同时在设计时已经隐藏了由分布式所带来的复杂特性。 The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster @@ -12,24 +16,32 @@ discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way. +Elasticsearch 的分布式特性很大程度上是透明的。教程不会要求你了解分布式系统、分片、集群算法或其他数十种分布式概念。 +它会在你的电脑中的每个节点上开心地跑程序,即使你要在一个包含 100 个节点的集群上运行程序,一切依旧顺畅。 Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood: +Elasticsearch 尽其所能来隐藏分布式系统的复杂性。这里我们列举了一些自动运行的操作: * Partitioning your documents into different containers((("documents", "partitioning into shards")))((("shards"))) or _shards_, which can be stored on a single node or on multiple nodes + * 在不同容器或分片中进行文档分区,同时操作可以储存在单一或多个节点中; * Balancing these shards across the nodes in your cluster to spread the indexing and search load + * 为拓展索引以及搜索负载量,在集群中跨节点平衡分片; * Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure + * 复制每个分片来保证数据多余拷贝,这样能防止因为硬件故障导致数据丢失; * Routing requests from any node in the cluster to the nodes that hold the data you're interested in + * 将集群中任一节点中的请求与存储有你感兴趣数据的节点联系在一起; * Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss + * 随着集群增长,无缝地整合新节点;自动重新分配分片来恢复丢失节点。 As you read through this book, you'll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about @@ -37,9 +49,10 @@ how the cluster scales and deals with failover (<>), handles document storage (<>), executes distributed search (<>), and what a shard is and how it works (<>). +当你阅读此教程时,你会遇到有关 Elasticsearch 分布式特性的补充章节。这些章节将教你有关集群规模、如何处理故障转移、处理文档存储、执行分布式搜索、什么是分区(shard)以及它的工作原理。 These chapters are not required reading--you can use Elasticsearch without understanding these internals--but they will provide insight that will make your knowledge of Elasticsearch more complete. Feel free to skim them and revisit at a later point when you need a more complete understanding. - +这些章节不强制你阅读,你完全可以在没有理解这些内核知识的情况下使用 Elasticsearch。但是他们会帮助你在学习 Elasticsearch 时有更多的洞察力。跳过他们吧,不要有压力,当你需要更深入的理解时回过头来读也无妨。 From b4c0dd5c653918ec5a5f848e96e1c517f71d6ee3 Mon Sep 17 00:00:00 2001 From: Josephjin Date: Wed, 7 Sep 2016 10:45:49 +0800 Subject: [PATCH 8/8] chapter1:/010_intro/50_Conclusion.asciidoc --- 010_Intro/50_Conclusion.asciidoc | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/010_Intro/50_Conclusion.asciidoc b/010_Intro/50_Conclusion.asciidoc index b386d3716..db1d2db88 100644 --- a/010_Intro/50_Conclusion.asciidoc +++ b/010_Intro/50_Conclusion.asciidoc @@ -1,13 +1,17 @@ === Next Steps +=== 后续步骤 By now you should have a taste of what you can do with Elasticsearch, and how easy it is to get started. Elasticsearch tries hard to work out of the box with minimal knowledge and configuration. The best way to learn Elasticsearch is by jumping in: just start indexing and searching! +相信现在你已经对于能够通过 Elasticsearch 实现什么样的功能、以及操作的简易程度有了初步概念。Elasticsearch 努力通过最少的知识以及认证来达到开箱即用的效果。 +我相信,学习 Elasticsearch 最好的方式是参与:开始属于你的索引和搜索吧。 However, the more you know about Elasticsearch, the more productive you can become. The more you can tell Elasticsearch about the domain-specific elements of your application, the more you can fine-tune the output. +然而,对于 Elasticsearch 你知道得越多,你就更有生产力。对于你的应用你能掌握更多特定域元素,你就能更好地微调输出值。 The rest of this book will help you move from novice to expert. Each chapter explains the essentials, but also includes expert-level tips. If you're just getting started, these tips are probably not immediately relevant @@ -15,3 +19,4 @@ to you; Elasticsearch has sensible defaults and will generally do the right thing without any interference. You can always revisit these chapters later, when you are looking to improve performance by shaving off any wasted milliseconds. +本书的后续内容将帮助你从新手向专家转变,每个章节不仅阐述必要信息,而且包含专家级建议。如果你仍旧是新手水平,这些建议短时间内不一定适用;Elasticsearch 具备的默认值通常能保证在没有干扰下正确执行命令。当你寻求方式以提高程序质量来节约以毫秒计的浪费,到时回过头来重新阅读这些章节不迟。