DDoS Protection in the Cloud: L3/L4/L7, Anycast, Rate Limiting, and a Response Plan

When a service starts returning 5xx errors, latency increases, and the CDN reports anomalous traffic, the first mistake is to turn everything on—or, conversely, turn everything off—indiscriminately. The same symptom, “service unavailable,” can indicate different problems: a saturated network link, overload at the HTTP layer, an API attack, credential stuffing, or direct bypass of the CDN to the origin.

Cloud DDoS protection is a layered system:

  • L3/L4 protection is required to protect against volumetric attacks, SYN/UDP/TCP floods, and network perimeter overload;
  • Anycast and the provider's network help distribute incoming traffic, but do not replace filtering on their own;
  • CDN reduces the load on the origin through caching and proxying if the origin is not directly accessible;
  • WAF and bot protection operate on HTTP/S requests, client behavior, URIs, methods, and suspicious patterns;
  • Rate limiting and quotas help with L7 pressure, API abuse, credential stuffing, and expensive operations;
  • Response runbook predefines roles, thresholds, authorization for drastic measures, and the procedure for escalation to the provider.

The core logic is simple: first determine where the pressure is occurring, then choose the appropriate control layer. If the network link is saturated, limits inside the application will no longer help. If an expensive API method is overloading the database, network filtering alone is not enough. If the attack is bypassing the CDN and going directly to the origin IP, direct access to the origin must be blocked.

Practical DDoS readiness is not just about having protection enabled with a provider. It requires a clear decision chain: which metrics to monitor, where to stop traffic, who can enable restrictions, how to avoid blocking legitimate users, and when to involve a cloud, CDN, or anti-DDoS provider.

First, determine the attack layer: L3/L4, L7, or API abuse

The first diagnostic question is not “which protection should we enable,” but “where exactly is the pressure being applied.” 5xx errors, increased latency, and user complaints may look the same on a status dashboard, but technically they can mean different things: a saturated network perimeter, an overloaded application, exhausted database connections, large-scale login attempts, or traffic bypassing the CDN and hitting the origin.

L3/L4: pressure on the network and transport layers

If the application does not show a noticeable increase in RPS, but pps, SYN/UDP/TCP packets, bps, dropped packets, and load on the network perimeter have risen sharply, the attack is most likely at L3/L4. In this case, a significant portion of the traffic may be dropped or degraded before it ever reaches the application.

Limits inside the backend code will not help here: requests may simply never reach the HTTP layer. Filtering is required at the cloud provider or anti-DDoS provider level, along with blocking junk traffic before it reaches the customer’s infrastructure and checking the status of the load balancer, network links, and network interfaces.

L7: load on HTTP/S and the application

If RPS is increasing, the network link is not saturated, and CPU usage, memory usage, latency, and 4xx/5xx responses are rising along with the number of HTTP/S requests, this points more to an L7 HTTP flood. These requests can look almost like normal user traffic: they open pages, call search, navigate through URIs, and use permitted methods.

Different measures apply here: WAF, bot protection, URI- and method-based rules, challenges for suspicious clients, request rate limits, and origin protection. The main goal is to stop the traffic before it overloads the application and the backend.

API abuse: load on resource-intensive operations

A separate scenario is API abuse, or resource exhaustion. The requests are technically legitimate, but they put load on resource-intensive operations: reports, search, exports, payments, external APIs, queues, or the database.

The indicators are usually visible not only in RPS but also in specific resources: the number of DB connections, queue depth, external API response times, 429/5xx rates, and latency on a specific API endpoint all increase. In these cases, filtering alone is not enough; quotas, per-user/API-key limits, queues, caching, and emergency safeguards for heavy operations are also needed.

Check for CDN bypass separately

Sometimes the CDN appears stable, but the origin is overloaded when accessed directly. This may mean that the attacker knows the public IP address of the origin server and is bypassing the CDN/WAF.

Check traffic to the origin IP, connections outside the CDN, network logs, and application 5xx errors. The basic mitigation is to block direct access to the origin and allow traffic only from trusted CDN, proxy, or DDoS provider networks.

A quick diagnostic process looks like this:

  • If bps/pps/SYN/UDP/TCP metrics are increasing and packets are being dropped before they reach the application, start with the network layer and the provider;
  • If RPS, latency, and HTTP/S 4xx/5xx errors are increasing, focus on the CDN, WAF, bot protection, and L7 rules;
  • If a specific API endpoint, database, queue, or external service is overloaded, apply limits, quotas, and protection for expensive operations;
  • If the CDN is quiet but the origin is under load, check for direct access to the origin IP.

This distinction reduces the risk of making the first mistake. Rate limiting will not help if the link is already saturated, and network filtering alone will not stop expensive API calls that appear valid at the perimeter.

After diagnostics, move on to the protection architecture and determine where to stop traffic before it reaches the origin server.

Cloud Protection Layers: Where to Block Traffic Before It Reaches the Origin

Network Perimeter and Anycast

Anycast distributes incoming traffic across multiple points in the provider’s network. It is not a single entrance to a building, but several entrances in different locations: the attack is not concentrated at one door. However, traffic distribution is not the same as traffic scrubbing. Each entrance still needs filters.

At this layer, L3/L4 volumetric attacks and protocol floods must be mitigated: high packet volumes, SYN/UDP/TCP spikes, and transport-layer pressure. This traffic must be dropped before it reaches the customer’s link, load balancer, or virtual machines.

CDN, WAF, and Bot Protection

A CDN reduces load on the origin through distribution and caching. Static content and some recurring requests never reach the origin server. However, if the origin is directly accessible by IP address, an attacker can bypass the CDN entirely. In that case, protection is technically enabled, but the load still reaches the application.

WAF and bot protection operate at the HTTP/S layer, working with URIs, methods, headers, client behavior, suspicious patterns, and automated traffic. This is where challenge mechanisms, route-based rules, and limits on suspicious activity are appropriate. However, a WAF should not be the first line of defense against high-volume UDP or SYN traffic—that is the role of network-layer DDoS filtering.

Application, API, and Origin

Rate limiting, quotas, and API rules should be enforced closer to the business logic: by user, API key, route, method, and costly operation. They help mitigate L7 pressure and API abuse, but they must take effect before the database, queues, and external services become overloaded.

The load balancer and origin are a late-stage line of defense. A load balancer can distribute acceptable load, but it should not serve as the primary DDoS filter. The origin should be closed off from direct access, allowing traffic only from the CDN, proxies, or the DDoS provider’s networks.

Each layer should reduce the load on the next one. If the origin remains publicly accessible or rules are enforced only inside the application, part of the cloud protection is effectively bypassed: an attacker does not have to go through the CDN and WAF if they know the real IP address of the origin server.

In this architecture, autoscaling is a supporting mechanism, not DDoS protection. It can absorb part of a legitimate traffic spike or smooth out L7 load, but it does not solve a saturated network link and can quickly increase costs during an attack. Scaling should therefore complement filtering, not replace it.

Observability is also needed. The team must be able to see exactly where protection was triggered: how much traffic was dropped by the provider, what reached the CDN/WAF, which requests made it to the load balancer, and how the origin/API and database are behaving. Without this, the architecture becomes a set of “black boxes,” and during an incident it is unclear which layer needs to be strengthened.

Rate Limiting for L7 and APIs: When It Helps and Where It Can Be Risky

When traffic has already passed through network filters, a CDN, and a WAF, and looks partly legitimate, rate limiting is no longer a universal “DDoS protection button” but a targeted control. It is applied based on one attribute or a combination of attributes: IP address, user, session, API key, endpoint, HTTP method, ASN, geolocation, device, or account at login.

A single limit for the entire service is almost always too blunt. It can help with HTTP floods, credential stuffing, high-volume API calls, and expensive operations such as search, reports, exports, order payment, or login. However, the same limit can block legitimate demand during a sale, release, or marketing campaign.

Where rate limiting helps

Limits are useful when there is a clear overload point and a measurable baseline of normal behavior. For example:

  • /login — limits on login attempts, authentication errors, and request frequency per account;
  • Search and catalog — protection against overly rapid crawling of search results and scraping;
  • Reports and exports — quotas, queuing, caching, and limits on the selected time period;
  • Checkout — protection against repeated suspicious payment transactions;
  • Partner APIs — limits by API key, tenant, and operation type;
  • External integrations — limits on calls that could overload a third-party service.

The main criterion is not high RPS on its own, but the combination of deviation from the norm, an overload point, and context. The same traffic can be an attack on a normal day and expected during a sale.

Where limits can be dangerous

Rate limiting should not be enabled “by feel.” An overly blunt rule can affect corporate NAT networks, mobile operators, partner integrations, search engine crawlers, and legitimate users. NAT matters here because many real clients may access the internet through one or more public IP addresses.

A practical example: an API endpoint for generating reports is called correctly, but each request runs an expensive query and holds database connections. Limiting by IP alone is weak in this case: a large customer or partner may be coming through a shared NAT. It is safer to apply limits specifically to report generation, introduce a queue, cache the result, narrow the export period, and enable an emergency safeguard to protect the database.

This way, the service does not “go down completely”; it degrades in a controlled manner: users can continue working with core functionality while the resource-intensive operation is restricted.

Soft and Hard Measures

Responses should vary in severity. Soft measures are appropriate for early signs of pressure: throttling, challenges, temporary rate reductions, separate quotas for expensive operations, and limits on repeated login failures.

Hard measures are needed when degradation is confirmed: 5xx errors or latency are increasing, the database, queue, or external integration is overloaded, and soft limits are not reducing load quickly enough. These measures include widespread 429 responses, blocking ranges, strict limits on API methods, temporarily disabling resource-intensive features, and forcing client verification.

The authority to apply such restrictions should be defined in advance. Typically, the on-call engineer can enable only approved soft limits and safeguards. Hard measures are approved by the incident manager together with the service or product owner if there is a risk of affecting login, payment, partner APIs, or other critical flows.

In summary, rate limiting is useful not as a single global rule, but as a set of predefined scenarios with thresholds, owners, and a rollback plan. Therefore, the next layer of protection is not a new technology, but a runbook that defines who makes the decision, which metrics confirm an attack, and when the team can move from soft measures to hard ones.

Response runbook: roles, metrics, decisions, and provider

If hard limits could affect customers, partners, and revenue, decisions about them should not be left to the on-call engineer to make “based on the situation.” A runbook must be prepared before an attack: roles, communication channels, access rights, thresholds, a list of approved measures, and escalation criteria.

During an incident, it is important to know in advance who is responsible for what. The incident manager coordinates decisions, the network/CDN/WAF team manages the perimeter, the application and API owner assesses the impact on business scenarios, SRE monitors platform resilience, the security team analyzes sources, support relays user reports, and a dedicated contact communicates with the cloud, CDN, or DDoS provider.

The workflow should be short:

  1. Confirm the incident. Rule out a planned peak, a release, a marketing campaign, an internal outage, or a configuration error.
  2. Classify the traffic pressure by metrics. Check bps/pps, SYN/UDP/TCP indicators, dropped/blocked traffic, RPS, 4xx/5xx, latency, origin CPU/RAM, database connections, queue lengths, and statistics by endpoint, IP, ASN, geography, and user-agent.
  3. Enable only authorized measures. For example, temporary limits, challenges, blocking of obvious sources, or emergency safeguards for expensive operations—within pre-approved permissions.
  4. Monitor legitimate traffic. Check whether false 429/403 responses have increased, and whether partner integrations, login, order payment, or critical APIs have broken.
  5. Escalate to the provider. Do this in the event of L3/L4 pressure, CDN bypass to the origin, an increase in dropped traffic, insufficient visibility, or if the current rules do not reduce degradation.

It is best to send the provider the start time, affected services, symptoms, sample sources/ASNs/URIs, traffic graphs, rules already enabled, and the impact on users right away. This speeds up filtering and reduces the risk of lengthy clarifications at a time when every minute affects availability.

After the attack, a review is required: the timeline, what worked, what did not, where false positives occurred, how much the degradation cost, and which decisions came too late. A runbook does not reduce the attack volume itself; it reduces chaos in decision-making. At the same time, the attack may be aimed precisely at that—chaotic decisions lead to unnecessary downtime, service level degradation, and direct financial and reputational losses. Therefore, after an incident, it is important to conduct a postmortem analysis and update the runbook as needed based on its findings: thresholds, rules, contacts, authority to apply strict measures, and the escalation process.

Brief readiness checklist

Before an attack, the team should know normal traffic peaks, have metrics for the network, CDN/WAF, application, API, and database, keep the origin closed to direct access, have provider contacts, and have agreed permissions to enable stricter measures.

During an attack, it is important to quickly distinguish network pressure from HTTP and API load, enable only approved restrictions, watch for false 403/429 responses, and separately monitor critical workflows: login, payments, and partner APIs.

After an attack, the thresholds, filtering rules, allowlists, limits for expensive operations, provider contacts, and the runbook itself should be updated.

The main indicator of readiness is not that “DDoS protection is enabled,” but a clear decision chain: where the pressure is coming from, which layer stops it, who is authorized to strengthen the measures, and how the team returns the service to normal operation without unnecessarily blocking legitimate users.

Conclusion

Cloud-based DDoS protection should not be a set of enabled services, but a managed decision-making framework. L3/L4 attacks should be stopped before they reach the customer’s infrastructure; L7 traffic should be filtered at the HTTP/S layer; and API abuse should be controlled through quotas, limits, and protection for expensive operations.

The runbook links these measures to people, metrics, thresholds, and escalation to the provider. This is what reduces the risk of two extremes: failing to stop an attack in time or, conversely, blocking your own users.

FAQ

How does an L3/L4 DDoS attack differ from an L7 HTTP flood?

L3/L4 attacks put pressure on the network and transport layers: bps, pps, and SYN/UDP/TCP packet rates increase, and the link or network perimeter can become saturated. An L7 HTTP flood looks like a large volume of HTTP/S requests and overloads the application itself, the backend, CPU, memory, connections, or specific routes.

Why doesn’t rate limiting protect against all DDoS attacks?

Rate limiting works when traffic has reached the HTTP/API layer and can be restricted by IP address, user, key, endpoint, or session. If the network link is already saturated by UDP or SYN floods, application-level limits will not take effect in time—filtering at a cloud or DDoS protection provider is required.

Do you need to use Anycast if you already have a CDN?

Anycast and a CDN often work together, but they solve different problems. Anycast distributes incoming traffic across the provider’s network, while a CDN caches and proxies content. However, neither Anycast nor a CDN replaces a WAF, origin protection, or dedicated API rules.

Which metrics are most important to monitor during an attack?

At a minimum: bps, pps, SYN/UDP/TCP indicators, dropped/blocked traffic, RPS, 4xx/5xx, latency, CPU/RAM usage on the origin, database connections, queue lengths, and spikes per endpoint. ASN, geography, user-agent, and statistics by IP address or API key are also useful.

When should you escalate an incident to the provider?

Escalation is required in cases of L3/L4 pressure, CDN bypass targeting the origin IP, an increase in dropped packets, insufficient visibility, ineffective current rules, or a risk of critical services becoming unavailable. It is best to immediately provide the provider with the start time, symptoms, graphs, affected services, and the mitigation measures already enabled.

What should you do after the attack is over?

Review the timeline, check the effectiveness of filtering and limits, identify false blocks, and assess the impact on users and business processes. Then update thresholds, WAF/rate limiting rules, allowlists, provider contacts, and the runbook.

Sources

1. AWS — “Best Practices for DDoS Resiliency”


2. Cloudflare Developers — “DDoS Protection: Attack coverage”


3. Microsoft Learn — “Azure DDoS Protection fundamental best practices”


4. OWASP — “API Security Top 10 2023: API4 Unrestricted Resource Consumption”

Comment

Subscribe to our newsletter to get articles and news