High-performance and highly available VPS/VDS with automatic installation and full root access to the OS. The ordered resources are guaranteed to be reserved for you.
Fortify your operational continuity with our resilient disaster recovery solutions, ensuring swift recovery and minimal downtime in the face of unforeseen challenges.
In cloud infrastructure, DNS manages not only names but also, indirectly, the actual path traffic takes. The same app.example.com name can direct an external user to a public load balancer, while a service inside a VPC or VNet receives a private endpoint. DNS therefore needs to be designed not around the record itself, but around the question: who is making the query, and what address should be returned?
The basic model is as follows:
Public DNS serves the external-facing perimeter: users, partners, public APIs, CDN, and external load balancers.
Private DNS zones serve internal networks: VPC, VNet, project network, hybrid connections, private databases, and internal services.
Split-horizon DNS allows a single name to return different responses depending on the source of the query: externally, a public endpoint; internally, a private one.
TTL defines how long a DNS response can remain in the cache. It is a trade-off between the speed of changes and cache stability.
DNS failover routes new DNS queries to a backup endpoint, but it does not work like an instant switch.
The main risk is that a DNS response depends on more than just the record in the zone. The result is affected by the query source, the private zone’s association with the network, the recursive resolver, the OS cache, the application cache, the TTL, and the state of the service. As a result, checking DNS from a developer’s laptop does not prove that a pod in Kubernetes, a VM in a private subnet, or a service from a hybrid network will receive the same response.
TTL also should not be treated as a guaranteed failover time. If a record has been in use for a long time with a TTL of 3600, lowering the TTL to 60 seconds immediately before an incident will not clear caches that already contain the previous response. A low TTL helps well-behaved clients replace stale responses more quickly, but it increases the number of DNS queries and does not control connections that are already open.
DNS failover is useful as part of a resilience strategy, but it is not a substitute for a DR plan. It can change the response for new name resolutions, but it will not restore data, perform replication, guarantee RPO/RTO, or resolve application-level issues. For critical systems, DNS should be one layer in the overall recovery architecture, alongside load balancers, health checks, data replication, failover procedures, and regular testing.
Why DNS Design Should Start with the Query Source
In a simple infrastructure, DNS is often treated as a directory: a name maps to an IP address. In the cloud, this model quickly becomes too limited. A company may have a public API, internal services in a VPC or VNet, private databases, multiple regions, and hybrid connectivity to an office or data center network.
In this type of environment, DNS starts to affect the traffic path. The same FQDN may resolve differently from the internet, from a private subnet, from a Kubernetes cluster, or from a hybrid network. An error in a zone, a private DNS association, or a TTL value can send a service to the wrong endpoint, increase latency, expose an unnecessary public route, or break failover.
That is why design should begin not with choosing an A, CNAME, or Alias record, but with the basic name resolution model: who makes the DNS query, which resolver it passes through, which zone should respond, and which endpoint is considered correct.
Public DNS and private DNS: who is querying and which endpoint receives the request
After the introduction, the key question in Cloud DNS is not “what is the zone called,” but “where did the query come from.” A DNS name by itself does not determine the route. The same FQDN can return different answers depending on which resolver handles the query: public DNS from the internet or a private resolver inside a VPC, VNet, or hybrid network.
Public DNS: an external client receives a public endpoint
For an Internet user, www.example.com should typically resolve to a public entry point: a CDN, a CNAME to an external service, a public IP address, or a public load balancer. This response is returned by a public DNS zone accessible from the Internet.
Internet user
→ recursive resolver
→ public DNS zone
→ public endpoint / CDN / public load balancer
This is the external perimeter. Its purpose is to direct users, partners, and external systems to the public application or API correctly.
Private DNS: an internal service gets a private endpoint
Another example is db.internal.example.com. External users do not need this name. It is queried by a VM, a container, a Kubernetes workload, or an internal service in a VPC/VNet.
The response is usually the database’s private IP address, a private load balancer, an internal API, or an address used for service discovery. This response is returned by a private DNS zone linked to specific networks.
Service in VPC
→ cloud/private resolver
→ private DNS zone
→ private IP / private load balancer
This leads to an important practical implication: checking DNS from a developer’s laptop does not prove that a service inside a VPC will receive the same response. Queries go through different resolvers and may end up in different zones. DNS should therefore be checked from the environment where the client actually runs: from an external user’s browser, from the application container, from a VM in the appropriate subnet, or from a connected hybrid network.
How to avoid mixing up public, private, and split-horizon DNS
To avoid confusing DNS resolution models, it is useful to compare them by the source of the request and the type of endpoint returned:
Approach
Who makes the request
Which endpoint is returned
Main risk
Public DNS
A user or external service on the internet
Public IP, CDN, CNAME, public load balancer
Routing external traffic to the wrong destination or exposing unnecessary public names
Private DNS
A service inside a VPC/VNet, project network, or hybrid network
Testing from the wrong network and getting a misleading result
Split-horizon DNS
External and internal clients for the same FQDN
Different responses in the public and private environments
Receiving an unexpected response because the wrong zone or request source is used
Public and private DNS solve different problems. The former serves the external environment, while the latter serves internal networks. Private DNS, however, is not a complete security control for a service: it hides records from public DNS, but it does not replace IAM, firewalls/security groups, network policies, or application-level access control.
This separation provides architectural clarity: public names point to public entry points, while private names point to internal resources. In real cloud architectures, however, external and internal clients often need to use the same FQDN. This is where split-horizon DNS comes in: a single name exists in two views and returns different responses depending on where the request comes from.
Split-horizon DNS: one name, different responses from different networks
How split-horizon works
Split-horizon DNS, or split-view DNS, means that a single fully qualified domain name exists in two views: public and private. The response depends not on the name itself, but on which resolver and which zone the query passes through.
The flow looks like this:
Internet user
→ public resolver
→ public DNS zone
→ app.example.com
→ public endpoint
Service in VPC/VNet
→ private resolver
→ private DNS zone
→ app.example.com
→ private endpoint
The same FQDN does not imply the same DNS response. That is why you need to test not just the name itself, but the name from a specific network context: from the internet, from a VPC/VNet, from a Kubernetes cluster, from a VM in the target subnet, or from a hybrid network.
A practical example is a SaaS application with the name api.example.com. Externally, this name points to a public load balancer or CDN. Inside the cloud, the same name returns the address of a private load balancer or internal API. This is convenient for applications: there is no need to maintain separate public-api.example.com and internal-api.example.com names, and less conditional logic is required in the configuration.
Where split-horizon breaks down
The main risk is overlapping zones. If a private zone is not associated with the required VPC or VNet, a service inside the cloud may not receive the private answer and may get the public endpoint instead. In a hybrid setup, a similar error occurs when an office or data center resolver sends the query to public DNS rather than to the cloud private resolver.
As a result, a Kubernetes cluster, a CI/CD runner, a virtual machine, and a developer’s laptop may see different answers for the same name. Each answer will be “correct” within its own resolution path.
Before introducing split-horizon, it is useful to check:
Which network the DNS query is sent from;
Which zone should respond — public or private;
EU Cloud Infrastructure You Control
Run production workloads on dedicated resources across EU data centres. Transparent pricing, no hidden costs.
Full control over compute, storage, and networking.
Whether the private zone is associated with the required VPCs/VNets;
Whether there is a conflicting zone in the hybrid DNS setup;
Whether an internal service is accidentally using a public endpoint.
This check reduces the risk of traffic being silently routed along the wrong path. Otherwise, an internal service may use a public entry point, with different latency, unnecessary outbound traffic, different firewall/security group rules, and potential certificate and availability issues.
Split-horizon solves the problem of using consistent names, but it requires disciplined zone management and validation from different networks. Even when a client receives the correct answer, the next question remains: how long that answer will live in the cache and how quickly changes will reach clients. This is where TTL becomes important.
TTL: a trade-off between cutover speed and cache stability
TTL is the lifetime of a DNS response in cache. It does not mean that a record will “take effect at the provider in 60 seconds,” and it does not guarantee that all clients will see the new IP address at the same time. The authoritative DNS server may already be returning the new value, while a recursive resolver, the OS cache, or an application cache may still be holding the old response.
A simple example: api.example.com is being moved from one load balancer to another. If the record had been served for several hours with a TTL of 3600, lowering the TTL to 60 seconds immediately before the cutover will not force existing caches to forget the old address. They have already received a response with a one-hour lifetime and, if they behave correctly, will continue to use it until the TTL expires.
How to choose a TTL
A TTL should not be chosen on the assumption that “lower is better.” A low TTL helps compliant clients evict stale responses faster, but it increases the number of DNS queries and reliance on resolvers. A high TTL reduces noise and load, but makes changes take effect more slowly.
TTL
Where it is appropriate
Benefit
Limitation
30–60 seconds
Planned cutovers, dynamic records
Stale responses expire sooner
More DNS queries, with no guarantee of an immediate transition
300 seconds
Many application services
A balance between manageability and load
Changes still have a delay
3600+ seconds
Stable names that rarely change
Fewer queries and a more stable cache
Changes and cutovers are slow
For critical changes, the TTL should be lowered in advance. A proper plan is to first reduce the TTL, wait for the old higher TTL to expire, then change the record and verify name resolution from the required VPCs/VNets, hybrid networks, and external locations. After the change has stabilized, you can restore a higher TTL for normal operation.
A high TTL is useful when a record is stable. It reduces DNS load, reduces external dependencies for repeat requests, and makes client behavior less noisy. A short TTL is therefore not a universal improvement, but a tool for records that genuinely need to be managed quickly.
Why TTL Is Not the Same as Failover Time
TTL is a trade-off setting, not a guarantee of recovery time. It only limits how long a response is stored by the components in the chain that honor the TTL.
This directly affects a future DNS failover: the higher the TTL before an outage, the longer some clients may continue using the old endpoint. Lowering the TTL at the moment of failure is too late—the old responses have already been distributed across caches.
Even a low TTL does not eliminate all delays. A client may have an application cache, an OS cache, a recursive resolver with its own behavior and an increased TTL, or an already open connection to the old endpoint. DNS does not revoke such connections or force the application to recreate its connection pool immediately.
TTL therefore explains why DNS changes have inertia. DNS failover should be viewed not as an instant switch, but as a mechanism that changes responses for new DNS queries while taking health checks, caches, and client behavior into account.
DNS failover: how new DNS lookups switch over and why it isn’t instant
How DNS failover works
A typical scenario: api.example.com points to the primary load balancer in us-east, while the standby endpoint is in eu-west. The health check detects a failure in the primary region, the routing policy takes effect, and the authoritative DNS server starts returning the eu-west address.
In simplified terms, it looks like this:
Health check detects failure
→ authoritative DNS returns secondary endpoint
→ new DNS queries go to secondary endpoint
However, the switchover will be seen first by clients that make a new DNS query after the response changes. Others may still use the old cache or keep an open connection to the primary endpoint.
The actual failover time is made up of several factors:
Failure detection
DNS response update
Cache lifetime
Client behavior
This is not an SLA, but a latency model. It shows why a low TTL helps but does not make failover instantaneous.
What DNS Does Not Cover
DNS does not manage already established TCP connections, keep-alive, connection pools, or retry logic in the application. If a client keeps a connection open to the primary load balancer, it may continue trying to use it until timeouts occur, retries run, or the connection is re-established.
Private endpoints have a separate limitation: external health checks often cannot see a private IP inside a VPC or VNet. For this reason, the health of a private service must be checked from a network environment that can reach it, or exposed externally through an internal health signal.
Because of this, DNS failover should be treated as a gradual switchover for new name resolutions, not as a single event for all clients. It is useful as a routing layer during a failure, but it does not replace a load balancer, application-level retries, or a complete DR plan.
Conclusion
DNS in cloud infrastructure should be designed based on the source of the request and the expected endpoint. Public DNS serves the external perimeter, private zones serve internal networks, and split-horizon DNS allows a single FQDN to return different responses for different networks, but it requires precise zone attachment and validation from real environments.
TTL determines the inertia of DNS responses in the cache, while DNS failover changes the route only for new name resolutions. Therefore, for critical systems, DNS must be part of the overall fault-tolerance design, alongside load balancers, health checks, data replication, RTO/RPO, recovery procedures, and regular testing.
FAQ
How does a private DNS zone differ from a public DNS zone?
A public DNS zone responds to queries from the internet and typically returns a public endpoint: a CDN, a public IP address, a CNAME to an external service, or an external load balancer. A private DNS zone is accessible only from associated VPCs, VNets, or hybrid networks and returns internal addresses: a private IP address, a private load balancer, an internal API, or a database endpoint.
When should you use split-horizon DNS?
Split-horizon DNS is appropriate when a single FQDN must work for both external clients and internal services, but return different endpoints. For example, from outside, api.example.com points to a public load balancer, while inside the VPC it points to a private one. This approach should be used only when you have control over the zones, network associations, and DNS checks across all critical environments.
Why doesn’t a low TTL guarantee instant failover?
TTL limits how long compliant caches retain a DNS response, but it does not control every component in the chain. Recursive resolvers, the OS cache, the application cache, and already open connections may retain the old state longer than expected. As a result, a low TTL speeds up the removal of the old response, but it does not make DNS failover instantaneous.
What TTL should you choose for a cloud service?
There is no one-size-fits-all value. For records that change, a short TTL—such as 30–300 seconds—is often used, but you need to account for the increase in DNS queries and the dependence on resolvers. For stable names, you can set a higher TTL if slower switching does not violate availability requirements.
Does DNS failover replace a load balancer?
No. DNS failover changes the response to new DNS queries. A load balancer handles traffic and backend pools after the client has already received an address and initiated a connection. In a fault-tolerant architecture, DNS failover and a load balancer can complement each other, but they do not perform the same function.
What is especially important for private endpoints during failover?
The health check must be able to reach the private service from the appropriate network segment. External checks often cannot access a private IP inside a VPC or VNet, so an internal health signal, a metric, an agent in the correct network, or integration with cloud monitoring is needed.
Subscribe to our newsletter to get articles and news
Cookie consent
This site uses cookies to ensure it works properly and to track how you use it. By clicking 'Accept', you agree to these technologies. For more details, please see our Privacy Policy and Cookies Policy
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.