chapter240_part30:240_Stopwords/40_Divide_and_conquer.asciidoc #212

blogsit · 2016-08-23T02:04:08Z

1.翻译文字标题需要重新定义，多指教

xuej · 2016-08-24T02:44:47Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-However, matching documents would not be required to contain all high-frequency terms.  If you would prefer all low- and high-frequency terms to be
-required, you should use a `bool` query instead.   As we saw in
-<<stopwords-and>>, this is already an efficient query.
+将操作符参数设置成 `and` 会要求所有低频词都必须匹配，同时对包含所有高频词的文档给予更高评分。但是，在匹配文档时，并不要求文档必须包含所有高频词，如果希望文档包含所有的低频和高频词，我们应该使用一个 `bool` 来替代。正如我们在 `and` 操作符（and Operator）<<stopwords-and>> 中看到的，它的查询效率已经很高了。


如果希望文档包含所有的低频和高频词，=>逗号改为句号？

逗号和句号都可以吧，看英文的原文是作为一句话在描述

sorry，我标记错了，是这一句：“但是，在匹配文档时，并不要求文档必须包含所有高频词，“ 这句的末尾改为句号？

xuej · 2016-08-24T09:36:26Z

LGTM

medcl · 2016-10-22T12:57:19Z

编译失败

luotitan · 2016-11-29T14:42:28Z

本地编译未通过能帮忙异常贴出来一下吗？mac下软件太大

blogsit · 2016-12-01T13:14:46Z

@luotitan 能帮忙异常贴出来一下吗？mac下软件太大

pengqiuyuan · 2016-12-04T14:35:50Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-dead''.  This approach greatly reduces the number of documents that need to be
-examined and scored.
+`must` 意味着至少有一个低频词　&#x2014;`quick` 或者 `dead` &#x2014;须出现在被匹配文档中。所有其他的文档被排除在外。`should`　语句查找高频词 `and` 和 `the`，但也只是在`must` 语句查询的结果集文档中查询。
+`should`语句的唯一的工作就是在对如``Quick _and the_ dead''和``_The_ quick but　dead''语句进行评分时，前者得分比后者高。这种方式可以大大减少需要进行评分计算的文档数量。



must 意味着至少有一个低频词— quick 或者 dead —必须出现在被匹配文档中。所有其他的文档被排除在外。should　语句查找高频词 and 和 the，但也只是在 must 语句查询的结果集文档中查询。
should 语句的唯一的工作就是在对如 Quick _and the_ dead 和 _The_ quick but　dead 语句进行评分时，前者得分比后者高。这种方式可以大大减少需要进行评分计算的文档数量。

pengqiuyuan · 2016-12-04T14:36:21Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-10 matches. We are really interested only in documents in which the terms all occur
-together, so in the case where there are no low-frequency terms, the query is
-rewritten to make all high-frequency terms required:
+当使用 `or`　查询高频词条((("stopwords", "low and high frequency terms", "only high frequency terms")))，如（&#x2014;``To be, or not to be''）&#x2014;进行查询时性能最差。只是为了返回最匹配的前十个结果就对只是包含这些词的所有文档进行评分是盲目的。我们真正的意图是查询整个&#x2014;（``To be, or not to be''）&#x2014;词条出现的文档，所以在这种情况下，不存低频所言，这个查询需要重写为所有高频词条都必须：


当使用 or　查询高频词条((("stopwords", "low and high frequency terms", "only high frequency terms")))，如— To be, or not to be —进行查询时性能最差。只是为了返回最匹配的前十个结果就对只是包含这些词的所有文档进行评分是盲目的。我们真正的意图是查询整个词条出现的文档，所以在这种情况下，不存低频所言，这个查询需要重写为所有高频词条都必须：

pengqiuyuan · 2016-12-04T14:57:16Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-and less-important (high-frequency) terms.((("stopwords", "low and high frequency terms"))) Documents that match only the less
-important terms are probably of very little interest.  Really, we want
-documents that match as many of the more important terms as possible.
+在查询字符串中的词项可以分为更重要 (低频词) 和次重要 (高频词) 这两类。((("stopwords", "low and high frequency terms"))) 只与次重要词项匹配的文档很有可能不太相关。实际上，我们想要文档能尽可能多的匹配那些更重要的词项。


(低频词)(高频词) 英文括号。

pengqiuyuan · 2016-12-04T14:58:22Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-considered almost meaningless.  With the `stop` token filter, these domain-specific terms would have to be added to the stopwords list manually. However,
-because the `cutoff_frequency` looks at the actual frequency of terms in the
-index,  these words would be classified as _high frequency_ automatically.
+`cutoff_frequency` 配置的好处是，你在 _特定领域_ 使用停用词不受约束。((("domain specific stopwords")))((("stopwords", "domain specific")))例如,关于电影网站使用的词 _movie_ ，_color_ ，_black_ 和 _white_ ，这些词我们往往认为几乎没有任何意义。使用 `stop` 词汇单元过滤器，这些特定领域的词必须手动添加到停用词列表中。然而 `cutoff_frequency` 会查看索引里词项的具体频率，这些词会被自动归类为 _高频词汇_ 。


例如,关于电影网站使用的词
英文逗号

关于电影网站使用的词 movie ，color ，black 和 white ，
修改为
关于电影网站使用的词 movie 、 color 、 black 和 white ，

pengqiuyuan · 2016-12-04T15:00:26Z

240_Stopwords/40_Divide_and_conquer.asciidoc


 *********************************************

-Take this query as an example:
+以下面查询为例:


以下面查询为例:
英文冒号

pengqiuyuan · 2016-12-04T15:01:33Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-<1> Any term that occurs in more than 1% of documents is considered to be high
-    frequency. The `cutoff_frequency` can be specified as a fraction (`0.01`)
-    or as an absolute number (`5`).
+<1> 任何词项出现在文档中超过1%，被认为是高频词。`cutoff_frequency` 配置可以指定为一个分数(`0.01`)或者一个正整数(`5`)。


英文括号

pengqiuyuan · 2016-12-04T15:01:53Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-This query uses the `cutoff_frequency` to first divide the query terms into a
-low-frequency group (`quick`, `dead`) and a high-frequency group (`and`,
-`the`). Then, the query is rewritten to produce the following `bool` query:
+此查询通过`cutoff_frequency` 配置，将查询条件划分为低频组 (`quick`, `dead`)和高频组 (`and`,`the`)。然后，此查询会被重写为以下的`bool` 查询：


英文括号

pengqiuyuan · 2016-12-04T15:04:03Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-document like ``Quick _and the_ dead'' higher than ``_The_ quick but
-dead''.  This approach greatly reduces the number of documents that need to be
-examined and scored.
+`must` 意味着至少有一个低频词&#x2014; `quick` 或者 `dead` &#x2014;必须出现在被匹配文档中。所有其他的文档被排除在外。`should`　语句查找高频词 `and` 和 `the`，但也只是在 `must` 语句查询的结果集文档中查询。


格式丢失，修改为
排除在外。 should 语句

格式丢失，修改为
高频词 and 和 the ，但也

格式丢失，修改为
但也只是在 must 语句

pengqiuyuan · 2016-12-04T15:07:20Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-However, matching documents would not be required to contain all high-frequency terms.  If you would prefer all low- and high-frequency terms to be
-required, you should use a `bool` query instead.   As we saw in
-<<stopwords-and>>, this is already an efficient query.
+将操作符参数设置成 `and` 会要求所有低频词都必须匹配，同时对包含所有高频词的文档给予更高评分。但是，在匹配文档时，并不要求文档必须包含所有高频词。如果希望文档包含所有的低频和高频词，我们应该使用一个 `bool` 来替代。正如我们在 `and` 操作符（and Operator）<<stopwords-and>> 中看到的，它的查询效率已经很高了。


正如我们在<>中看到的，它的查询效率已经很高了。

pengqiuyuan · 2016-12-04T15:08:26Z

240_Stopwords/40_Divide_and_conquer.asciidoc

-<1> Because there are only two terms, the original 75% is rounded down
-    to `1`, that is: _one out of two low-terms must match_.
-<2> The high-frequency terms are still optional and used only for scoring.
+<1>　因为只有两个词，原来的75%向下取整为`1`，意思是：必须匹配低频词的两者之一。


格式丢失，修改为
因为只有两个词，原来的75%向下取整为 1 ，

medcl · 2016-12-30T07:46:49Z

@blogsit 自己本地预览一下，还有一些小问题

blogsit · 2017-01-05T13:09:33Z

@medcl 把小问题帮忙截图贴出来看看。本地没有环境。谢谢

medcl · 2017-01-05T15:18:27Z

第四章的初版翻译

c34fe5b

blogsit added the to be review label Aug 23, 2016

自己修改翻译内容

067152e

xuej reviewed Aug 24, 2016
View reviewed changes

review 修改

bbbe9be

xuej added to be final review and removed to be review labels Aug 24, 2016

medcl added to be improve and removed to be final review labels Oct 22, 2016

修改编译

1ebc87c

blogsit added to be final review and removed to be improve labels Nov 14, 2016

luotitan added to be improve and removed to be final review labels Nov 29, 2016

hua.chen added 2 commits December 4, 2016 22:11

修改格式问题

95800d4

修改格式问题

8d3c988

pengqiuyuan reviewed Dec 4, 2016

View reviewed changes

hua.chen added 2 commits December 4, 2016 22:44

修改格式问题

e990be7

修改格式问题

d86345a

pengqiuyuan reviewed Dec 4, 2016

View reviewed changes

修改格式问题

478cd2b

blogsit added to be final review and removed to be improve labels Dec 4, 2016

medcl added to be improve and removed to be final review labels Dec 30, 2016

格式化

9b12fa0

blogsit added to be final review and removed to be improve labels Jan 6, 2017

medcl merged commit 614a94e into elasticsearch-cn:cn Jan 6, 2017

chapter240_part30:240_Stopwords/40_Divide_and_conquer.asciidoc #212

chapter240_part30:240_Stopwords/40_Divide_and_conquer.asciidoc #212

Uh oh!

Conversation

blogsit commented Aug 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuej commented Aug 24, 2016

Uh oh!

medcl commented Oct 22, 2016

Uh oh!

luotitan commented Nov 29, 2016 • edited by blogsit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blogsit commented Dec 1, 2016

Uh oh!

pengqiuyuan Dec 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengqiuyuan Dec 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

medcl commented Dec 30, 2016

Uh oh!

blogsit commented Jan 5, 2017

Uh oh!

medcl commented Jan 5, 2017

Uh oh!

Uh oh!

luotitan commented Nov 29, 2016 •

edited by blogsit

Loading

pengqiuyuan Dec 4, 2016 •

edited

Loading

pengqiuyuan Dec 4, 2016 •

edited

Loading