-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Description
Looks like most of the stack pods are failing to start after pod termination because of DNS entries are not registered fast enough in headless services (depends on cloud venodr, in GKE it's up to 60s), for example:
hdfs-namenode-0 namenode java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-namenode
This may also influence other services (such as kafka).
Fix:
- set in headless services
spec.publishNotReadyAddresses: true, which will enforce registration of dns hosts even if they are not ready. Ready state is based on pod readiness/liveness probes, if they pass, the given pod is added to the endpoints. In this setup enforcing publishing DNS entries which are not read is not an issue, actually this is expected because the way Hadoop stack was designed. So the DNS entries should be added, while appropriate java apps will handle the actual availability of the processes within pods.
Reference:
It's not DNS
There's no way it's DNS
It was DNS
Metadata
Metadata
Assignees
Labels
No labels