KubeReconcile

FinOps · 12 min read · June 1, 2026

How to reconcile your Kubernetes cost estimate with the real cloud bill

Every platform team that runs Kubernetes on AWS or GCP eventually hits the same wall: the in-cluster cost estimate — from kubecost, OpenCost, or a homegrown script — says one number, and the cloud bill says something 20–60% larger. This is a field guide to closing that gap deterministically, the exact method our engine automates.

Why the estimate and the bill never match

In-cluster cost tools price what they can see: the CPU and memory requested (or used) by pods, multiplied by a node-cost rate. That is a genuinely useful number — it tells you which workloads consume which resources. But it is structurally incapable of matching the invoice, for five reasons:

  1. Idle / unallocated capacity. You pay for whole nodes. If a node is 55% requested, the other 45% is real money the estimate never assigns to anyone.
  2. Network and data transfer. Cross-AZ chatter, NAT gateway processing, egress, and load-balancer hours are billed at the account level and are nearly invisible inside the cluster.
  3. Shared and control-plane spend. The EKS/GKE management fee, managed Prometheus/Grafana, container registry, and CloudWatch ingestion rarely appear in the allocation export.
  4. Storage over-provisioning. A 500 GB gp3 volume backing a 120 GB PersistentVolume bills for 500 GB. Snapshots and orphaned disks compound it.
  5. Discounts and credits. Reserved Instances, Savings Plans, and committed-use discounts make the bill lower than an on-demand estimate — pulling the variance the other way and masking the items above.
The reconciliation question isn't "what does each pod cost?" It's "why is the bill different from what we projected, and who owns the difference?"

The two inputs you already have

You don't need live cluster access to reconcile. You need two exports:

Step 1 — Put both sides on the same axis

The bill and the estimate use different vocabularies, so first normalize both onto five categories:compute, storage, network, managed, and other. For the bill, classify each line by its service and usage type (EC2/GCE → compute, EBS/PD/snapshots → storage, NAT/ELB/data-transfer → network, EKS/GKE/CloudWatch/registry → managed). For the estimate, mapcpuCost + ramCost + gpuCost → compute, pvCost → storage, and so on.

Step 2 — Decompose the variance, additively

Define variance as bill − estimate. The trick that makes the report trustworthy: compute the per-category difference, because those differences sum exactly to the total variance.

variance      = billTotal − estimateTotal
idle_compute  = billCompute  − estimateCompute
network        = billNetwork  − estimateNetwork
storage        = billStorage  − estimateStorage
shared         = billManaged  − estimateManaged
untagged       = billOther    − estimateOther
# these five sum to variance, by construction

No hand-wavy percentages. Every dollar of the gap lands in exactly one bucket, and the buckets reconcile to the penny. A negative idle_compute bucket, for instance, is the fingerprint of a Savings Plan: your bill came in under an estimate priced at on-demand rates.

Step 3 — Redistribute the unallocated cost back to owners

A waterfall of buckets is interesting; an owner-level bill is actionable. So we push each bucket back onto namespaces using a fair-share key — each namespace's share of total cluster compute — with storage allocated by PersistentVolume share instead:

share(ns)        = compute(ns) / totalCompute
trueCost(ns)     = estimate(ns)
                 + idle    × share(ns)
                 + shared  × share(ns)
                 + network × share(ns)
                 + storage × pvShare(ns)

Because the shares sum to one, the namespace true-costs sum back to the actual bill. Roll namespaces up by team label and you have, finally, the real cost of each team — including the idle and network spend they were silently driving.

Step 4 — Write it in English

The output that changes behavior isn't the table — it's the sentence. "Your estimate projected $24k but the bill was $37k; $6k of that is idle headroom on the m5 node group, $5.3k is NAT-gateway and cross-AZ traffic the cluster never sees, and the paymentsteam carries 38% of it." That paragraph is what gets pasted into a budget review and acted on. Big dashboards bury it; a reconciliation report leads with it.

Common gotchas

Automate it

This whole method is deterministic — which is exactly why it's worth automating rather than rebuilding in a spreadsheet every month. KubeReconcile runs steps 1–4 on your two exports and hands back the narrative, the waterfall, and the per-team true-cost table. The free tier does one cluster, monthly. Try it on the sample data in under a minute.