DNS load-balancing is a useful technique whereby a DNS server may give multiple A(ddress) records (or AAAA records for IPv6) for a requested DNS name query. The client resolver library will do what it will with this information. There are a lot of different resolvers and configurations for how they handle queries. However, most resolvers will pick one of the records for a request by an application using (e.g. gethostbyname
.
Often, DNS load-balancing can be used as a "poor-man's" load balancer as specifically it is up to the client what end-point will be picked and there is not a lot of information passed in a DNS response that is used by most clients. Another technique one will see is network routing load-balancers such as those who are using Border Gateway Protocol (internal BGP or across the internet) can use BGP with some awareness of network location. DNS load-balancer is most equivalent to BGP ECMP due to the client limitations mentioned before. Further, the time-to-live (TTL) used for DNS caching is usually relatively long to lessen resources required for authoritative DNS servers. This long-lived nature of DNS records limits rapid changes in advertisements. Often, changes are can be expected to propagate through DNS servers within a - well configured - local-network on the order of minutes and on the broader Internet it is not unreasonable to expect changes to take on the order of days.
DNS load-balancing can be helpful to have one ingress hostname for multiple clusters serving the same service or resource. For example, talking to https://Apple.COM your client is directed to numerous diverse clusters of webservers across the globe but they all appear to serve the same data for your uses.
HTTP persistent connections is a technique in use with HTTP/1.1 and later where a web-client requests multiple resources within a single TCP connection to a web-server. This has numerous latency advantages in particular. Most modern web-client and web-servers implement some support for persistent connections.
Those using DNS load-balancing with Kubernetes clusters may not realize that their Golang (and other languages) web-clients are likely doing HTTP persistent connections which is - usually - a good thing! However, one using DNS load-balancing for multiple Kubernetes clusters may not realize, their Kubernetes service mesh (Traefik, Istio) gateway is the web-server the web-client is actually doing HTTP keep-alive with first, then connections from your service mesh gateway is likely doing HTTP keep-alive to your service running in a Kubernetes pod.
If one is using DNS load-balancing to decomission a Kubernetes cluster for a service advertised in DNS, one would usually remove the cluster's ingress hostname from the service hostname. An obvious point is, if your clients are not requerying DNS to see this change in DNS, e.g. they have the A/AAAA-record cached, they will not see this change until they next query DNS. One unobvious point is if you have HTTP persistent connections for long running services, should you then decomission a service on the Kubernetes cluster (e.g. set replicas to zero or decomission your deployment) your clients will start getting HTTP 503's from the ingest which it is still connected but which no longer has any backing service to route your request to and in the case of a Golang client at least, unless your client then disconnects and reconnects in the service, it will continue to get 503 errors indefinitely.
To see this behavior yourself on a Linux host, one can use this proof-of-concept code. Follow these steps:
To run this test requires a few system-level changes:
- Configure your machine's DNS server to be localhost -- will break most of your outbound network requests (steps below)
- Need to allow the built Go binary to bind to privileged ports 53 (DNS) and 80 (HTTP) (the
Makefile
handles this on Linux)
The Go progrma here offers three components:
- DNS Server
- HTTP Server
- HTTP Client
The method of test is to run three distinct processes:
- DNS Server answers that one.one.one.one is your local host (the test web server) and after 5 seconds the DNS server starts redirecting example.com to Cloudflare's 1.1.1.1 since it should be known to return an HTTP 200.
- HTTP Server which answers on port 80 with an HTTP 200 for 10 seconds and then starts returning 503's.
- HTTP Client which requests http://example.com for 25 seconds
To build the Go code with an acceptable version of Go in your path, run:
$ make build
$ sudo mv /etc/resolv.conf /etc/resolv.conf.bk
$ echo "nameserver 127.0.0.1" | sudo tee /etc/resolv.conf
$ make test
Initial Client DNS Query
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:29 msg="Using local records"
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:35 msg="Query for" opcode=0
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:41 msg="Query for" record=one.one.one.one.
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:45 msg="Found answer sending" record=one.one.one.one. answer=127.0.0.1
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:29 msg="Using local records"
time=2025-05-31T20:29:42.880-06:00 level=INFO source=dns/server.go:35 msg="Query for" opcode=0
time=2025-05-31T20:29:42.881-06:00 level=INFO source=dns/server.go:29 msg="Using local records"
time=2025-05-31T20:29:42.881-06:00 level=INFO source=dns/server.go:35 msg="Query for" opcode=0
Note: You should see no more DNS requests unless the Go client errors or the Go client fails to read the whole response.Body
stream and close the response.Body
stream.
Initial Webserver Responses
time=2025-05-31T20:29:42.882-06:00 level=INFO source=server/server.go:19 msg="Got request" time=2025-06-01T02:29:42.882Z
time=2025-05-31T20:29:42.882-06:00 level=INFO source=server/server.go:24 msg=Sending code=200
Initial Client Responses
2025/05/31 20:29:42 "GET http://one.one.one.one/ HTTP/1.1" from 127.0.0.1:33132 - 200 7B in 237.205µs
time=2025-05-31T20:29:42.882-06:00 level=INFO source=client/client.go:32 msg="KeepAlive: Got Response" status=200 len=7 time=2025-06-01T02:29:42.878Z
Verify HTTP persistent connections
To verify we are using a single persistent connection, we use netstat
to see connections to port 80
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 127.0.0.1:80 127.0.0.1:33132 ESTABLISHED -
tcp 0 0 127.0.0.1:33132 127.0.0.1:80 ESTABLISHED -
Eventual Server Responses and Client Responses
Once the timeline has passed we see 502's returned
time=2025-05-31T20:37:28.522-06:00 level=INFO source=server/server.go:19 msg="Got request" time=2025-06-01T02:37:28.522Z
time=2025-05-31T20:37:28.523-06:00 level=INFO source=server/server.go:22 msg=Sending code=503
2025/05/31 20:37:28 "GET http://one.one.one.one/ HTTP/1.1" from 127.0.0.1:51564 - 503 0B in 69.584µs
time=2025-05-31T20:37:28.523-06:00 level=INFO source=client/client.go:32 msg="KeepAlive: Got Response" status=503 len=0 time=2025-06-01T02:37:28.522Z
End of Test
After 25 seconds, the Makefile will verify the DNS record changed for one.one.one.one
to 1.1.1.1
from 127.0.0.1
and kill all the test processes
time=2025-05-31T20:37:28.538-06:00 level=INFO source=dns/server.go:32 msg="Using normal records"
time=2025-05-31T20:37:28.538-06:00 level=INFO source=dns/server.go:35 msg="Query for" opcode=0
time=2025-05-31T20:37:28.538-06:00 level=INFO source=dns/server.go:41 msg="Query for" record=one.one.one.one.
time=2025-05-31T20:37:28.538-06:00 level=INFO source=dns/server.go:45 msg="Found answer sending" record=one.one.one.one. answer=1.1.1.1
1.1.1.1
Test timeout