The premise
Cloud management in 2026 looks superficially like the cloud management of three years ago: the same providers, the same primitives, the same vocabulary. Underneath, the shape of the workload has changed enough that the playbook has to be rewritten.
Inference workloads, GPU spend, data-residency regulation, and the operational realities of running AI features at production scale have pushed cloud from a cost-out conversation to a capability conversation. The team that treats it only as a bill to be optimized is no longer competitive on what the application can do.
This article is about what changed, what the new operational defaults look like, and how SDEN approaches cloud engagements in this new environment.
The bill is the symptom, not the disease
Cloud cost overruns in 2026 are almost always an architecture problem, not a discounting problem.
A familiar engagement pattern is the finance team escalating a cloud bill that doubled in six months. The reflex is to negotiate harder with the vendor or to chase the obvious culprits: idle instances, oversized databases, snapshots nobody is using. The reflex captures real savings on the first pass and almost nothing on the second.
The real driver of cost in 2026 cloud deployments is usually structural: a workload pattern that does not match the pricing model of the resources it runs on. Bursty AI inference on always-on instances. Analytical queries against the operational database. Network egress between regions that nobody noticed in the architecture review. Logs and traces collected at full fidelity, retained indefinitely.
Fixing the architecture produces order-of-magnitude savings. Negotiating the contract produces single-digit ones. The order matters.
From provisioning to capability
Cloud engineering still includes the work it has always included: architecture design, provisioning through infrastructure-as-code, deployment automation, observability, and the operational discipline of running production. That is the floor.
What is new is the layer above the floor. Capacity planning for inference workloads with bursty, expensive GPU profiles. Multi-region architecture that respects data-residency regulation in a world where US, Canadian, and EU rules diverge meaningfully. Hybrid models where the operational workload runs on the cheapest available compute and the model serving runs where the latency and the licensing allow. None of these were core to the cloud engineer's job five years ago. All of them are now.
SDEN's cloud engagements increasingly look like architectural engagements (designing the shape of the deployment) rather than provisioning engagements. The provisioning has been automated for years; the design has not.
Operational defaults for an AI-shaped workload
AI workloads break some of the assumptions classical cloud architectures rely on. They are bursty rather than steady, expensive per call rather than per byte, dependent on third-party providers whose latency and availability the cloud architect does not control, and sensitive to where the data is physically located in ways the rest of the application is not.
The operational defaults that work for this shape are different. Caching, batching, and graceful degradation at the application layer become first-class concerns. Provider abstraction at the model layer becomes mandatory, because every quarter the right model is a different model. And cost observability has to be wired into the application itself, not just into the cloud bill, because the unit economics of an AI feature are decided per request and need to be visible per request.
When SDEN designs cloud architecture for an AI-using product, this is where most of the work goes: not into Kubernetes manifests, but into the layer that decides what happens when the model takes too long, costs too much, or returns the wrong thing.
What changes in the cloud stack when AI shows up
Four practical shifts visible in production deployments where AI has become a load-bearing part of the application.
Capacity is planned against steady-state CPU and memory profiles. The autoscaler keeps the cluster within range, and the bill is predictable.
Capacity is planned against bursty GPU profiles with order-of-magnitude swings between idle and peak. The autoscaler decisions are now business decisions: more capacity means more cost-per-request.
Takeaway · Capacity planning becomes a unit-economics conversation, not an SRE conversation.
Observability tracks request latency, error rates, and saturation: the classical golden signals.
Observability tracks model latency, model errors, model cost-per-request, model output quality, and the cache hit rate that determines whether the feature is solvent.
Takeaway · The dashboard the SRE looks at has new rows. Some of them belong to the product team.
The cloud bill is broken down by service: compute, storage, network.
The cloud bill is broken down by product feature, with model spend attributed to the workflow that triggered it. Finance can answer the question of what an AI feature actually costs.
Takeaway · Attribution becomes a first-class engineering deliverable, because the alternative is flying blind.
Region selection is a one-time architectural decision at project kick-off.
Region selection is revisited every time a new regulation lands or a new provider opens a region, and the architecture is designed so that the move is a configuration change, not a rewrite.
Takeaway · Data-residency rules now move faster than projects. The architecture has to keep up.
Three defaults on every cloud engagement
We do not ship clouds. We ship architectures that happen to run on one. The pillars below are what we hold to.
Infrastructure as code, end-to-end
Every piece of the infrastructure is described in code, versioned, reviewed, and reproducible. We do not click-ops production. We do not click-ops staging either.
Cost observability at the feature level
Cost is attributed to features and traced to the requests that drove it. Surprises in the bill become exceptions, not the norm.
Region and provider portability where it matters
We pick one cloud as the home base, but the architecture keeps escape paths open for the workloads that may need them, whether for data-residency, sovereign-cloud, or cost-driven reasons.
The infrastructure the team trusts to grow into
Cloud success is invisible. The team uses the infrastructure and stops thinking about it.
A working cloud architecture lets the engineering team focus on the product, not on the plumbing. Deployments are routine. The bill is predictable, and when it is not, the team can explain why. Capacity is provisioned ahead of demand and decommissioned when demand ends. Regulators are answered with documentation that was generated, not retroactively assembled.
When SDEN finishes a cloud engagement, the test is simple: does the engineering team operate the system without us, comfortably, six months later. If they need us, we have not done the job.
The technology under the hood matters less than this property. It can be AWS, GCP, Azure, or a hybrid; it can be Kubernetes, Nomad, or serverless. What it cannot be is a black box only one engineer understands.
Cloud:
questions we get asked.
Direct answers to the questions we get asked the most. If yours isn't covered, write to the team.