diff --git a/510_Deployment/40_config.asciidoc b/510_Deployment/40_config.asciidoc index 94ea5404b..5d3183c0f 100644 --- a/510_Deployment/40_config.asciidoc +++ b/510_Deployment/40_config.asciidoc @@ -1,53 +1,43 @@ [[important-configuration-changes]] -=== Important Configuration Changes -Elasticsearch ships with _very good_ defaults,((("deployment", "configuration changes, important")))((("configuration changes, important"))) especially when it comes to performance- -related settings and options. When in doubt, just leave -the settings alone. We have witnessed countless dozens of clusters ruined -by errant settings because the administrator thought he could turn a knob -and gain 100-fold improvement. + +=== 重要配置的修改 +Elasticsearch 已经有了 _很好_ 的默认值,((("deployment", "configuration changes, important")))((("configuration changes, important")))特别是涉及到性能相关的配置或者选项。 +如果你有疑问,最好就不要动它。我们已经目睹了数十个因为错误的设置而导致毁灭的集群, +因为它的管理者总认为改动一个配置或者选项就可以带来 100 倍的提升。 [NOTE] ==== -Please read this entire section! All configurations presented are equally -important, and are not listed in any particular order. Please read -through all configuration options and apply them to your cluster. +请阅读整节文章,所有的配置项都同等重要,和描述顺序无关,请阅读所有的配置选项,并应用到你的集群中。 ==== -Other databases may require tuning, but by and large, Elasticsearch does not. -If you are hitting performance problems, the solution is usually better data -layout or more nodes. There are very few "magic knobs" in Elasticsearch. -If there were, we'd have turned them already! - -With that said, there are some _logistical_ configurations that should be changed -for production. These changes are necessary either to make your life easier, or because -there is no way to set a good default (because it depends on your cluster layout). +其它数据库可能需要调优,但总得来说,Elasticsearch 不需要。 +如果你遇到了性能问题,解决方法通常是更好的数据布局或者更多的节点。 +在 Elasticsearch 中很少有“神奇的配置项”, +如果存在,我们也已经帮你优化了! +另外,有些 _逻辑上的_ 配置在生产环境中是应该调整的。 +这些调整可能会让你的工作更加轻松,又或者因为没办法设定一个默认值(它取决于你的集群布局)。 -==== Assign Names +==== 指定名字 -Elasticseach by default starts a cluster named `elasticsearch`. ((("configuration changes, important", "assigning names"))) It is wise -to rename your production cluster to something else, simply to prevent accidents -whereby someone's laptop joins the cluster. A simple change to `elasticsearch_production` -can save a lot of heartache. +Elasticsearch 默认启动的集群名字叫 `elasticsearch` 。((("configuration changes, important", "assigning names")))你最好给你的生产环境的集群改个名字,改名字的目的很简单, +就是防止某人的笔记本电脑加入了集群这种意外。简单修改成 `elasticsearch_production` 会很省心。 -This can be changed in your `elasticsearch.yml` file: +你可以在你的 `elasticsearch.yml` 文件中修改: [source,yaml] ---- cluster.name: elasticsearch_production ---- -Similarly, it is wise to change the names of your nodes. As you've probably -noticed by now, Elasticsearch assigns a random Marvel superhero name -to your nodes at startup. This is cute in development--but less cute when it is -3a.m. and you are trying to remember which physical machine was Tagak the Leopard Lord. +同样,最好也修改你的节点名字。就像你现在可能发现的那样, +Elasticsearch 会在你的节点启动的时候随机给它指定一个名字。你可能会觉得这很有趣,但是当凌晨 3 点钟的时候, +你还在尝试回忆哪台物理机是 Tagak the Leopard Lord 的时候,你就不觉得有趣了。 -More important, since these names are generated on startup, each time you -restart your node, it will get a new name. This can make logs confusing, -since the names of all the nodes are constantly changing. +更重要的是,这些名字是在启动的时候产生的,每次启动节点, +它都会得到一个新的名字。这会使日志变得很混乱,因为所有节点的名称都是不断变化的。 -Boring as it might be, we recommend you give each node a name that makes sense -to you--a plain, descriptive name. This is also configured in your `elasticsearch.yml`: +这可能会让你觉得厌烦,我们建议给每个节点设置一个有意义的、清楚的、描述性的名字,同样你可以在 `elasticsearch.yml` 中配置: [source,yaml] ---- @@ -55,19 +45,17 @@ node.name: elasticsearch_005_data ---- -==== Paths +==== 路径 -By default, Elasticsearch will place the plug-ins,((("configuration changes, important", "paths"))) -((("paths"))) logs, and--most important--your data in the installation directory. This can lead to -unfortunate accidents, whereby the installation directory is accidentally overwritten -by a new installation of Elasticsearch. If you aren't careful, you can erase all your data. +默认情况下,((("configuration changes, important", "paths")))((("paths")))Elasticsearch 会把插件、日志以及你最重要的数据放在安装目录下。这会带来不幸的事故, +如果你重新安装 Elasticsearch 的时候不小心把安装目录覆盖了。如果你不小心,你就可能把你的全部数据删掉了。 -Don't laugh--we've seen it happen more than a few times. +不要笑,这种情况,我们见过很多次了。 -The best thing to do is relocate your data directory outside the installation -location. You can optionally move your plug-in and log directories as well. +最好的选择就是把你的数据目录配置到安装目录以外的地方, +同样你也可以选择转移你的插件和日志目录。 -This can be changed as follows: +可以更改如下: [source,yaml] ---- @@ -79,77 +67,60 @@ path.logs: /path/to/logs # Path to where plugins are installed: path.plugins: /path/to/plugins ---- -<1> Notice that you can specify more than one directory for data by using comma-separated lists. +<1> 注意:你可以通过逗号分隔指定多个目录。 -Data can be saved to multiple directories, and if each directory -is mounted on a different hard drive, this is a simple and effective way to -set up a software RAID 0. Elasticsearch will automatically stripe -data between the different directories, boosting performance. +数据可以保存到多个不同的目录, +如果将每个目录分别挂载不同的硬盘,这可是一个简单且高效实现一个软磁盘阵列( RAID 0 )的办法。Elasticsearch 会自动把条带化(注:RAID 0 又称为 Stripe(条带化),在磁盘阵列中,数据是以条带的方式贯穿在磁盘阵列所有硬盘中的) +数据分隔到不同的目录,以便提高性能。 -.Multiple data path safety and performance +.多个数据路径的安全性和性能 [WARNING] ==================== -Like any RAID 0 configuration, only a single copy of your data is saved to the -hard drives. If you lose a hard drive, you are _guaranteed_ to lose a portion -of your data on that machine. With luck you'll have replicas elsewhere in the -cluster which can recover the data, and/or a recent <>. - -Elasticsearch attempts to minimize the extent of data loss by striping entire -shards to a drive. That means that `Shard 0` will be placed entirely on a single -drive. Elasticsearch will not stripe a shard across multiple drives, since the -loss of one drive would corrupt the entire shard. - -This has ramifications for performance: if you are adding multiple drives -to improve the performance of a single index, it is unlikely to help since -most nodes will only have one shard, and thus one active drive. Multiple data -paths only helps if you have many indices/shards on a single node. - -Multiple data paths is a nice convenience feature, but at the end of the day, -Elasticsearch is not a software RAID package. If you need more advanced configuration, -robustness and flexibility, we encourage you to use actual software RAID packages -instead of the multiple data path feature. +如同任何磁盘阵列( RAID 0 )的配置,只有单一的数据拷贝保存到硬盘驱动器。如果你失去了一个硬盘驱动器,你 _肯定_ 会失去该计算机上的一部分数据。 +运气好的话你的副本在集群的其他地方,可以用来恢复数据和最近的备份。 + +Elasticsearch 试图将全部的条带化分片放到单个驱动器来保证最小程度的数据丢失。这意味着 `分片 0` 将完全被放置在单个驱动器上。 +Elasticsearch 没有一个条带化的分片跨越在多个驱动器,因为一个驱动器的损失会破坏整个分片。 + +这对性能产生的影响是:如果您添加多个驱动器来提高一个单独索引的性能,可能帮助不大,因为 +大多数节点只有一个分片和这样一个积极的驱动器。多个数据路径只是帮助如果你有许多索引/分片在单个节点上。 + +多个数据路径是一个非常方便的功能,但到头来,Elasticsearch 并不是软磁盘阵列( software RAID )的软件。如果你需要更高级的、稳健的、灵活的配置, +我们建议你使用软磁盘阵列( software RAID )的软件,而不是多个数据路径的功能。 ==================== -==== Minimum Master Nodes +==== 最小主节点数 -The `minimum_master_nodes` setting is _extremely_ important to the -stability of your cluster.((("configuration changes, important", "minimum_master_nodes setting")))((("minimum_master_nodes setting"))) This setting helps prevent _split brains_, the existence of two masters in a single cluster. +`minimum_master_nodes` 设定对你的集群的稳定 _极其_ 重要。 +((("configuration changes, important", "minimum_master_nodes setting")))((("minimum_master_nodes setting"))) +当你的集群中有两个 masters(注:主节点)的时候,这个配置有助于防止 _脑裂_ ,一种两个主节点同时存在于一个集群的现象。 -When you have a split brain, your cluster is at danger of losing data. Because -the master is considered the supreme ruler of the cluster, it decides -when new indices can be created, how shards are moved, and so forth. If you have _two_ -masters, data integrity becomes perilous, since you have two nodes -that think they are in charge. +如果你的集群发生了脑裂,那么你的集群就会处在丢失数据的危险中,因为主节点被认为是这个集群的最高统治者,它决定了什么时候新的索引可以创建,分片是如何移动的等等。如果你有 _两个_ masters 节点, +你的数据的完整性将得不到保证,因为你有两个节点认为他们有集群的控制权。 -This setting tells Elasticsearch to not elect a master unless there are enough -master-eligible nodes available. Only then will an election take place. +这个配置就是告诉 Elasticsearch 当没有足够 master 候选节点的时候,就不要进行 master 节点选举,等 master 候选节点足够了才进行选举。 -This setting should always be configured to a quorum (majority) of your master-eligible nodes.((("quorum"))) A quorum is `(number of master-eligible nodes / 2) + 1`. -Here are some examples: +此设置应该始终被配置为 master 候选节点的法定个数(大多数个)。((("quorum")))法定个数就是 `( master 候选节点个数 / 2) + 1` 。 +这里有几个例子: -- If you have ten regular nodes (can hold data, can become master), a quorum is -`6`. -- If you have three dedicated master nodes and a hundred data nodes, the quorum is `2`, -since you need to count only nodes that are master eligible. -- If you have two regular nodes, you are in a conundrum. A quorum would be -`2`, but this means a loss of one node will make your cluster inoperable. A -setting of `1` will allow your cluster to function, but doesn't protect against -split brain. It is best to have a minimum of three nodes in situations like this. +- 如果你有 10 个节点(能保存数据,同时能成为 master),法定数就是 `6` 。 +- 如果你有 3 个候选 master 节点,和 100 个 date 节点,法定数就是 `2` ,你只要数数那些可以做 master 的节点数就可以了。 +- 如果你有两个节点,你遇到难题了。法定数当然是 `2` ,但是这意味着如果有一个节点挂掉,你整个集群就不可用了。 +设置成 `1` 可以保证集群的功能,但是就无法保证集群脑裂了,像这样的情况,你最好至少保证有 3 个节点。 -This setting can be configured in your `elasticsearch.yml` file: +你可以在你的 `elasticsearch.yml` 文件中这样配置: [source,yaml] ---- discovery.zen.minimum_master_nodes: 2 ---- -But because Elasticsearch clusters are dynamic, you could easily add or remove -nodes that will change the quorum. It would be extremely irritating if you had -to push new configurations to each node and restart your whole cluster just to -change the setting. +但是由于 ELasticsearch 是动态的,你可以很容易的添加和删除节点, +但是这会改变这个法定个数。 +你不得不修改每一个索引节点的配置并且重启你的整个集群只是为了让配置生效,这将是非常痛苦的一件事情。 -For this reason, `minimum_master_nodes` (and other settings) can be configured -via a dynamic API call. You can change the setting while your cluster is online: +基于这个原因, `minimum_master_nodes` (还有一些其它配置)允许通过 API 调用的方式动态进行配置。 +当你的集群在线运行的时候,你可以这样修改配置: [source,js] ---- @@ -161,56 +132,36 @@ PUT /_cluster/settings } ---- -This will become a persistent setting that takes precedence over whatever is -in the static configuration. You should modify this setting whenever you add or -remove master-eligible nodes. +这将成为一个永久的配置,并且无论你配置项里配置的如何,这个将优先生效。当你添加和删除 master 节点的时候,你需要更改这个配置。 -==== Recovery Settings +==== 集群恢复方面的配置 -Several settings affect the behavior of shard recovery when -your cluster restarts.((("recovery settings")))((("configuration changes, important", "recovery settings"))) First, we need to understand what happens if nothing is -configured. +当你集群重启时,几个配置项影响你的分片恢复的表现。((("recovery settings")))((("configuration changes, important", "recovery settings")))首先,我们需要明白如果什么也没配置将会发生什么。 -Imagine you have ten nodes, and each node holds a single shard--either a primary -or a replica--in a 5 primary / 1 replica index. You take your -entire cluster offline for maintenance (installing new drives, for example). When you -restart your cluster, it just so happens that five nodes come online before -the other five. +想象一下假设你有 10 个节点,每个节点只保存一个分片,这个分片是一个主分片或者是一个副本分片,或者说有一个有 5 个主分片/1 个副本分片的索引。有时你需要为整个集群做离线维护(比如,为了安装一个新的驱动程序), +当你重启你的集群,恰巧出现了 5 个节点已经启动,还有 5 个还没启动的场景。 -Maybe the switch to the other five is being flaky, and they didn't -receive the restart command right away. Whatever the reason, you have five nodes -online. These five nodes will gossip with each other, elect a master, and form a -cluster. They notice that data is no longer evenly distributed, since five -nodes are missing from the cluster, and immediately start replicating new -shards between each other. +假设其它 5 个节点出问题,或者他们根本没有收到立即重启的命令。不管什么原因,你有 5 个节点在线上,这五个节点会相互通信,选出一个 master,从而形成一个集群。 +他们注意到数据不再均匀分布,因为有 5 个节点在集群中丢失了,所以他们之间会立即启动分片复制。 -Finally, your other five nodes turn on and join the cluster. These nodes see -that _their_ data is being replicated to other nodes, so they delete their local -data (since it is now redundant, and may be outdated). Then the cluster starts -to rebalance even more, since the cluster size just went from five to ten. +最后,你的其它 5 个节点打开加入了集群。这些节点会发现 _它们_ 的数据正在被复制到其他节点,所以他们删除本地数据(因为这份数据要么是多余的,要么是过时的)。 +然后整个集群重新进行平衡,因为集群的大小已经从 5 变成了 10。 -During this whole process, your nodes are thrashing the disk and network, moving -data around--for no good reason. For large clusters with terabytes of data, -this useless shuffling of data can take a _really long time_. If all the nodes -had simply waited for the cluster to come online, all the data would have been -local and nothing would need to move. +在整个过程中,你的节点会消耗磁盘和网络带宽,来回移动数据,因为没有更好的办法。对于有 TB 数据的大集群, +这种无用的数据传输需要 _很长时间_ 。如果等待所有的节点重启好了,整个集群再上线,所有的本地的数据都不需要移动。 -Now that we know the problem, we can configure a few settings to alleviate it. -First, we need to give Elasticsearch a hard limit: +现在我们知道问题的所在了,我们可以修改一些设置来缓解它。 +首先我们要给 ELasticsearch 一个严格的限制: [source,yaml] ---- gateway.recover_after_nodes: 8 ---- -This will prevent Elasticsearch from starting a recovery until at least eight (data or master) nodes -are present. The value for this setting is a matter of personal preference: how -many nodes do you want present before you consider your cluster functional? -In this case, we are setting it to `8`, which means the cluster is inoperable -unless there are at least eight nodes. +这将防止 Elasticsearch 从一开始就进行数据恢复,在存在 8 个节点(数据节点或者 master 节点)之前。 +这个值的设定取决于个人喜好:整个集群提供服务之前你希望有多少个节点在线?这种情况下,我们设置为 8,这意味着至少要有 8 个节点,该集群才可用。 -Then we tell Elasticsearch how many nodes _should_ be in the cluster, and how -long we want to wait for all those nodes: +现在我们要告诉 Elasticsearch 集群中 _应该_ 有多少个节点,以及我们愿意为这些节点等待多长时间: [source,yaml] ---- @@ -218,50 +169,35 @@ gateway.expected_nodes: 10 gateway.recover_after_time: 5m ---- -What this means is that Elasticsearch will do the following: +这意味着 Elasticsearch 会采取如下操作: -- Wait for eight nodes to be present -- Begin recovering after 5 minutes _or_ after ten nodes have joined the cluster, -whichever comes first. +- 等待集群至少存在 8 个节点 +- 等待 5 分钟,或者10 个节点上线后,才进行数据恢复,这取决于哪个条件先达到。 -These three settings allow you to avoid the excessive shard swapping that can -occur on cluster restarts. It can literally make recovery take seconds instead -of hours. +这三个设置可以在集群重启的时候避免过多的分片交换。这可能会让数据恢复从数个小时缩短为几秒钟。 -NOTE: These settings can only be set in the `config/elasticsearch.yml` file or on -the command line (they are not dynamically updatable) and they are only relevant -during a full cluster restart. +注意:这些配置只能设置在 `config/elasticsearch.yml` 文件中或者是在命令行里(它们不能动态更新)它们只在整个集群重启的时候有实质性作用。 [[unicast]] -==== Prefer Unicast over Multicast - -Elasticsearch is configured to use unicast discovery out of the box to prevent -nodes from accidentally joining a cluster. Only nodes running on the same -machine will automatically form cluster. - -While multicast is still https://www.elastic.co/guide/en/elasticsearch/plugins/current/discovery-multicast.html[provided -as a plugin], it should never be used in production. The -last thing you want is for nodes to accidentally join your production network, simply -because they received an errant multicast ping. There is nothing wrong with -multicast _per se_. Multicast simply leads to silly problems, and can be a bit -more fragile (for example, a network engineer fiddles with the network without telling -you--and all of a sudden nodes can't find each other anymore). - -To use unicast, you provide Elasticsearch a list of nodes that it should try to contact. -When a node contacts a member of the unicast list, it receives a full cluster -state that lists all of the nodes in the cluster. It then contacts -the master and joins the cluster. - -This means your unicast list does not need to include all of the nodes in your cluster. -It just needs enough nodes that a new node can find someone to talk to. If you -use dedicated masters, just list your three dedicated masters and call it a day. -This setting is configured in `elasticsearch.yml`: + +==== 最好使用单播代替组播 + +Elasticsearch 默认被配置为使用单播发现,以防止节点无意中加入集群。只有在同一台机器上运行的节点才会自动组成集群。 + +虽然组播仍然 https://www.elastic.co/guide/en/elasticsearch/plugins/current/discovery-multicast.html[作为插件提供], +但它应该永远不被使用在生产环境了,否在你得到的结果就是一个节点意外的加入到了你的生产环境,仅仅是因为他们收到了一个错误的组播信号。 +对于组播 _本身_ 并没有错,组播会导致一些愚蠢的问题,并且导致集群变的脆弱(比如,一个网络工程师正在捣鼓网络,而没有告诉你,你会发现所有的节点突然发现不了对方了)。 + +使用单播,你可以为 Elasticsearch 提供一些它应该去尝试连接的节点列表。 +当一个节点联系到单播列表中的成员时,它就会得到整个集群所有节点的状态,然后它会联系 master 节点,并加入集群。 + +这意味着你的单播列表不需要包含你的集群中的所有节点, +它只是需要足够的节点,当一个新节点联系上其中一个并且说上话就可以了。如果你使用 master 候选节点作为单播列表,你只要列出三个就可以了。 +这个配置在 `elasticsearch.yml` 文件中: [source,yaml] ---- discovery.zen.ping.unicast.hosts: ["host1", "host2:port"] ---- -For more information about how Elasticsearch nodes find eachother, see -https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html[Zen Discovery] -in the Elasticsearch Reference. +关于 Elasticsearch 节点发现的详细信息,请参阅 https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html[Zen Discovery] Elasticsearch 文献。