Skip to content

Conversation

ddelong
Copy link

@ddelong ddelong commented Jun 14, 2024

Problem

Sometimes after a deploy, resource managers get stuck in a state where they can't support specific job types that require HDFS delegation. It manifests into this error in those jobs:

Cannot invoke "java.net.InetAddress.isAnyLocalAddress()" because the return value of "java.net.InetSocketAddress.getAddress()" is null

Recycling the resource managers seems to resolve the condition.

See also: https://git.hubteam.com/HubSpot/HadoopPlanning/issues/326

Solution

My intuition of the problem here is that domain name in these objects isn't resolving at the time of creation. We have no explanation for that condition currently, but this code gives us a way to confirm or deny that while also improving the situation for customers. We reconstruct the InetSocketAddress to force resolution of the domain name so the getAddress() call doesn't return null. Also provide logging to gather data on how often this happens and if this recovers as expected.

Testing

We will want to test this in a few clusters to verify no regressions, but otherwise this should be benign.

@ddelong ddelong self-assigned this Jun 14, 2024
Copy link

@johnnysohn johnnysohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@ddelong ddelong force-pushed the fix-null-address-issue branch from e4449f7 to 5e289f6 Compare June 17, 2024 14:49
@ddelong ddelong merged commit da7a7b2 into hubspot-3.3.6 Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants