I figured I should start a series of short notes (can't call them articles) about random stuff I encounter in my day-to-day job, and which took me some time to figure out. Here we go.

Elasticsearch tip #1:

TL;DR: make use of the network.publish_host setting when deploying ES to multi-homed servers

Let's say you're deploying an ES(Elasticsearch) cluster.

In a production environment It's not uncommon to have multi-homed servers (that is, with multiple network interfaces connected to different networks/vlans/etc).

You've installed ES using your package manager of choice, you have setup your elasticsearch.yml. Multicast is disabled, and you have something like this for your unicast discovery:

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["server1", "server2"]

Everything looks fine, you start your ES nodes... and they don't see each-other. Just (maybe) some timeouts in the logs.

There's no firewall between the nodes, routing is allright, it looks like you can connect from one node to the other, and vice versa, but it still won't work...

This will save your day:

network.publish_host: this.servers.hostname

or you can use an IP as well...

According to the official documentation at elastic.co:

(edited) The network.publish_host setting allows to control the host the node will publish itself within the cluster so other nodes will be able to connect to it. Of course, this can’t be '', and by default, it will be the first non loopback address (if possible), or the local address.

What hapenned is that the first non loopback address might be on a vlan that doesn't allow the nodes to talk to each other (even if your connectivity tests worked because the default route goes through a different interface)