What Are Spot Instances / Preemptible VMs, and What Kinds of Workloads Are They Good For

Spot Instances and Preemptible VMs are interruptible virtual machines that cloud providers offer at a discount from their spare compute capacity. The basic idea is simple: the compute is cheaper, but the resource itself is not guaranteed and may be reclaimed.

In practice, these machines are not meant to be β€œcheap servers in general,” but rather for workloads that can tolerate a stop, a restart, or the loss of an individual node without breaking the whole process.

Typical examples include:

  • Batch processing
  • CI/CD jobs
  • Dev/test environments
  • Background compute tasks
  • Fault-tolerant workloads

For critical or stateful services, however, they are a much more debatable option. These VMs can be terminated by the provider, so they are usually used only where the application is designed to survive interruptions, restarts, and the loss of an individual machine without causing the whole service to collapse.

The main idea is simple: Spot / Preemptible VMs are not just regular VMs at a lower price, but discounted compute capacity for workloads that are designed from the start to handle interruptions.

Spot Instances and Preemptible VMs: What They Actually Are

Spot Instances and Preemptible VMs are discounted virtual machines that run on a cloud provider’s spare compute capacity.

Their main advantage is obvious: these resources cost noticeably less than standard VMs.

But that lower price comes with an important limitation. These machines are not treated as fully guaranteed resources. If the provider needs that capacity back, the instance may be stopped, terminated, or preempted. The details vary from one cloud to another, but the underlying logic stays the same: you save money on compute, but you do not get a full guarantee that a specific VM will keep running for as long as you would like.

So at their core, these are not simply β€œordinary VMs, just cheaper,” but discounted compute capacity with an interruption risk built in.

And it would be a mistake not to mention one important β€” and slightly amusing β€” nuance: different cloud providers use different names for essentially the same idea.

Why Different Clouds Use Different Names

That difference exists not because these are fundamentally different types of VMs, but because providers use different terminology and have evolved their own services over time. AWS uses the name Spot Instances, Azure uses Spot Virtual Machines, and in Google Cloud the main current name is Spot VMs, even though the older term Preemptible VMs still appears in documentation and legacy materials. Google explicitly states that Spot VMs are the newer version of the preemptible model and recommends using them instead.

This matters for one simple reason: a reader may encounter different names and assume they are looking at different models. In practice, the core logic is very similar everywhere β€” the provider offers compute capacity at a lower price, while retaining the right to interrupt the VM and return that capacity to the shared pool.

The naming differences become easier to see in a simple table:

CloudCurrent nameWhat matters to understand
AWSSpot InstancesAWS’s main term for discounted interruptible EC2 capacity
AzureSpot Virtual MachinesThe same general logic of discounted interruptible VMs
Google CloudSpot VMsThe main current term in Google Cloud
Google CloudPreemptible VMsHistorical term; the model is similar, but Google recommends Spot VMs instead

The main takeaway from this table is simple: the names differ, but the underlying approach is the same.

That is why, in articles like this, it is often useful to mention both labels β€” Spot Instances / Preemptible VMs β€” simply so that readers coming from AWS, Azure, or Google Cloud terminology do not get lost. Once that is clear, the conversation can move away from the wording itself and toward the real point: what exactly you gain on price, and what kind of risk you accept in return.

The Hidden Trade-Off Behind the Lower Price

The main advantage of these machines is obvious: they can reduce compute costs significantly.

That is exactly why Spot / Preemptible VMs are often considered for workloads that are flexible in timing, can scale in bursts, or can tolerate the loss of individual nodes without much drama.

But the lower price is always tied to risk.

The provider does not promise that a specific VM will remain available for as long as the application wants it to. If that spare capacity is needed again, the instance may be stopped, preempted, or terminated.

The core economics of these machines is built around a simple trade-off:

What you getWhat you take on
Cheaper computeThe risk that the VM may be interrupted
A chance to push the budget furtherThe need to design the workload around interruptions
Flexible scaling with discounted capacityNo guarantee that a particular node will remain stable
Strong value for background and batch workloadsNot suitable for every workload type

If the application can tolerate the loss of an individual machine, restart jobs, preserve intermediate results, or simply does not depend on the life of one particular node, the lower price can deliver a very strong benefit.

But if the workload depends on continuous availability, local state, or strict response-time guarantees, that benefit quickly becomes much less clear. Savings at the VM level can easily turn into much more expensive failures at the service level.

That is exactly why these machines should be evaluated not by the discount alone, but by how the workload behaves when an interruption actually happens.

One task may survive the loss of a single node and simply restart. Another may lose state, break execution, and bring down the service as a whole. Which naturally leads to the next practical question: which workloads are a good fit for Spot / Preemptible VMs, and where should they be avoided entirely?

What Kinds of Workloads They Fit in Practice

Workloads That Are Easy to Restart

The most obvious scenario is batch processing.

If the work is done in batches, in chunks, or through a queue, the loss of one VM usually does not break the entire process. A failed portion can simply be rerun, while the workload itself can be redistributed across other nodes.

CI/CD tasks also fit very naturally here.

Build jobs, test runs, one-off pipeline executions, temporary runners, and other short-lived compute tasks rarely require one specific machine to stay alive for long. For these workloads, what usually matters more is the overall execution cost and the ability to start a new instance quickly instead of relying on the one that was lost.

A similar logic applies to background compute workloads.

These may include asynchronous jobs, data processing, rendering, calculations, or other processes where the overall result matters more than the fate of any single VM. If the system can retry the task after a failure, then cheap interruptible capacity becomes a very reasonable fit.

Imagine, for example, an online store selling lawn mowers. At night, it recalculates recommendations, refreshes search indexes, and runs batch processing against the catalog. If one Spot VM disappears in the middle of that work, the job can simply be retried or picked up by another node. For this kind of background workload, losing a particular machine is unpleasant, but not critical.


Scenarios Where Losing One Instance Is Acceptable

Another good fit is dev/test environments.

If the team is spinning up temporary staging setups, machines for validating changes, or short-lived environments for experiments, the survival of any particular VM is usually not critical. The cost savings, however, can become quite noticeable β€” especially when there are many such environments or they are created regularly.

The same applies to broader fault-tolerant workloads.

These are workloads that are designed from the start with the assumption that an individual node may disappear at any time. Systems like this usually rely on a job queue, retry logic, checkpointing, or some other distribution model in which the loss of one VM does not bring down the whole service.

That is exactly where Spot / Preemptible VMs show their value most clearly. The less the system depends on a particular instance, the easier it becomes to turn cheap but unstable compute capacity into a real benefit rather than a source of random failures.

Where They Are Better Avoided

The lower price of these machines can look very tempting.

But this is exactly where the main trap begins: not every workload can tolerate the loss of a VM gracefully. If the task depends on continuous availability, local state, or predictable runtime on a specific node, then Spot / Preemptible VMs quickly stop being a cost-saving tool and start becoming a source of unnecessary risk.


Services Where Continuous Uptime Matters

The first problematic area is any service where stable online availability and predictable uptime are essential.

If an application must respond to users continuously, avoid dropping connections, and not depend on the sudden disappearance of a node, then an interruptible VM becomes too fragile a foundation. Yes, an individual instance can be replaced β€” but the fact that the provider may reclaim it at any moment already makes this model weaker for sensitive production workloads.

That is why teams are usually especially cautious about using these VMs for:

  • Production APIs
  • Checkout and payment flows
  • Latency-sensitive services
  • Systems with persistent user sessions
  • Single production nodes without a proper standby path

On paper, it may seem that since the instance is significantly cheaper, you can simply move a normal production server onto a Spot / Preemptible VM and reduce costs immediately. But if that workload is not built to survive interruption or eviction, the savings disappear very quickly. One badly timed reclaim event can cost more than the entire price difference between a regular VM and an interruptible one.

Workloads Tied to State and Data

The second risk zone is systems that depend heavily on state, data, or the special role of a particular node.

When an application is tied to local files, an active session, an unfinished write, data stored on disk, or the special role of one machine within a broader design, then losing a VM stops being just an β€œunpleasant restart.” At that point, you can end up with lost progress, broken connections, write errors, or more serious consequences at the service level.

That is exactly why these VMs are especially questionable for:

  • Databases
  • Stateful services with local state
  • Systems that are sensitive to losing current progress
  • Nodes where data consistency matters
  • Services where the role of a specific machine is hard to replace quickly

Imagine a game studio that keeps temporary match calculations, live game sessions, or pieces of state directly on a particular server. In a test environment, that may still be tolerable. But if the same model is carried into production on interruptible VMs, the disappearance of one instance may no longer mean just a restart β€” it may mean losing part of the state and creating a poor player experience.

From here, the next logical question is: what should you do if the workload needs cost savings, but interruptible VMs are simply too risky for it? In that situation, the real issue is no longer the discount itself, but choosing a more appropriate compute model.

What to Choose If Spot / Preemptible VMs Are Not a Good Fit

Not every workload is ready to live on interruptible virtual machines.

But that does not mean the team has only one option left β€” to go straight back to the most expensive and rigid budget model. In practice, there are almost always intermediate options between price and stability. The real question is no longer how to use Spot at any cost, but how to get the required level of reliability without overpaying for compute.

Here is a simple guide to the options most often considered instead of a pure Spot / Preemptible model:

OptionWhen it fits bestWhat it provides
Regular VMsWhen the workload needs predictable availability and cannot depend on eviction riskThe simplest and most stable foundation
A mixed pool of regular and Spot VMsWhen part of the workload is critical, but another part can tolerate interruptionA balance between stability and cost savings
Autoscaling with priority on regular nodesWhen reserve capacity is needed, but the service should not depend fully on SpotA softer compromise in terms of risk
Clusters and queues with task retryWhen the workload can adapt to the loss of individual nodesA safer way to use discounted capacity
Managed services instead of self-managed VMsWhen it is more important to reduce operational risk than to save on every machineLess dependence on the life of any single instance

In many cases, the team is not really choosing between β€œcheap” and β€œexpensive,” but between different levels of risk.

Sometimes the most reasonable approach is to keep the critical part of the workload on standard VMs and move only the background or flexible tasks onto Spot. This kind of mixed design makes it possible to avoid building the whole service on interruptible capacity while still capturing savings where doing so is actually safe.

In other scenarios, the better alternative is not primarily a pricing one, but an architectural one.

If the service is too sensitive to the loss of an individual machine, then it is often more useful not to search for an β€œeven cheaper VM,” but to choose a model in which the system itself depends less on the life of any one node.

In practice, the logic usually comes down to three paths:

  • Keep the critical workload on regular VMs
  • Build a mixed model using both regular and Spot instances
  • Redesign the workload so that it can genuinely survive interruptions

That last option is especially important.

If the system gains a job queue, retry logic, checkpointing, or a way to distribute work across several nodes, then some scenarios can be moved onto interruptible VMs much more safely. But this works only where the application is actually designed for that model, not where it is merely hoping that a particular instance will survive long enough.

That is why the better way to choose here is not by the size of the discount, but by the character of the workload itself.

First, understand how sensitive the system is to the loss of a node. Only then decide whether it is better served by regular VMs, a mixed pool, or a real architectural redesign around discounted interruptible capacity.

Conclusion

Spot / Preemptible VMs should be treated not as a universal way to make infrastructure cheaper, but as a tool for a specific type of workload.

They work well where the system is designed from the start to tolerate interruptions and does not depend on the survival of one particular node. In all other cases, an attempt to save on compute can easily turn into additional risk for the service.

That is why the right approach here is fairly simple: first evaluate the behavior of the workload itself, and only then decide whether it needs interruptible capacity, a mixed model, or a more stable foundation. In that order, Spot / Preemptible VMs create value rather than problems.

FAQ

Are Spot Instances and Preemptible VMs the same thing?

In practical terms, almost yes: in both cases, the idea is cheap but interruptible compute capacity. The main difference is provider terminology. In Google Cloud, the main current term is Spot VMs, while Preemptible VMs is the older version of the same model.

Can you run production on these VMs at all?

Sometimes yes, but only if the architecture itself can tolerate losing an individual node without major consequences. For single production servers, critical APIs, or sensitive stateful services, this is usually a poor fit. Both Azure and Google position these machines around interruption-tolerant or fault-tolerant workloads.

How much warning do you get before interruption?

That depends on the cloud. In AWS, Spot typically provides about a two-minute interruption notice before the instance is stopped or terminated. In Azure, eviction behavior depends on the scenario and policy, and preparation often relies on mechanisms such as Scheduled Events. In Google Cloud, Spot VMs may be interrupted at any time, so you should not depend on the long life of any one node.

Are these VMs a good fit for Kubernetes, queues, and batch tasks?

Yes. That is one of the most common use cases. Google explicitly discusses the use of preemptible/Spot VMs for fault-tolerant workloads in GKE, and Azure recommends Spot VMs for batch processing, dev/test, and large compute workloads.

If the task is long-running, does that mean these VMs are not suitable?

Not necessarily. They can still work if the task can save progress, restart cleanly, or be split into parts. The real issue is not the duration itself, but whether the workload depends on the uninterrupted life of one specific VM.

What is the advantage of a Spot VM over a regular VM besides price?

The main advantage is still the price, along with the ability to scale flexible workloads more cheaply. But there is also an architectural benefit: it pushes teams toward designing systems that depend less on one specific node. If the workload is not ready for that model, a regular VM is often the more practical option.

Sources

1. AWS β€” Spot Instance interruptions


2. AWS β€” Spot Instance interruption notices

3. Microsoft Learn β€” About Azure Spot Virtual Machines

4. Google Cloud β€” Spot VMs

Comment

Subscribe to our newsletter to get articles and news