nOps is an automated cloud cost optimization platform. We manage over $4 billion in annual cloud spend across AWS, GCP, and Azure — managing commitment-based discounts hourly, using ML to balance savings rates against lock-in risk, and providing cost visibility, forecasting, and anomaly detection through a centralized FinOps platform. Our pricing model is performance-based: we charge a percentage of the savings we deliver.

All of that is data-intensive. Every hour, we analyze usage patterns across thousands of accounts, evaluate commitment portfolios across three clouds and dozens of services, and make automated purchasing decisions.

The analytical backbone has been Databricks Lakehouse for years — processing billions of rows of cost and usage data with Spark, running ML models, and orchestrating our data pipelines. But as spend under management continued to scale, we saw an opportunity to simplify the architecture and bring the application layer closer to the Lakehouse.

The Architecture We Had

Our previous setup split responsibilities the way most data-heavy applications do. Analytics and metric computation lived in the Lakehouse. Customer-facing data — account configurations, user preferences, client-specific state — lived in a separate relational database.

The seams between the two systems created friction. Scheduled jobs and cron-based change detection kept things in sync, but data that was live in one system could take minutes to appear in the other. The operational overhead of maintaining a separate database stack — provisioning, patching, monitoring, capacity planning — pulled engineering time away from product work.

When we expanded to multi-cloud support across compute and non-compute services, the growing volume of workloads strained the architecture further. We had more data sources, more commitment types, more customer accounts flowing through the system. The team decided it was time to look at the infrastructure again — aiming to reduce the distance between our analytical layer and our application layer.

Why We Chose Lakebase

We selected Databricks Lakebase as our OLTP backbone. Jordan Stein, Director of Product, described the three factors that drove the decision:

Tight coupling to the Lakehouse. This was the biggest factor. With a separate Postgres instance, our data engineering teams relied on scheduled jobs and cron-based change detection to pick up changes in customer data. With Lakebase, the moment data is live in the operational database, it's consumable by the Lakehouse — without sync lag or scheduled polling. “We are talking scheduled jobs that had to run crons that are coming and picking up those changes, whereas now we know that the moment it's live, we can consume it. This has been a game changer for us."

Auto-scaling and auto-stop. Lakebase's serverless autoscaling adjusts compute to match traffic and scales to zero when idle. Our usage patterns are spiky — dashboards get heavy traffic during business hours, commitment processing runs in batches, anomaly detection fires on schedule. Even with aggressive auto-stop configurations, this led to a meaningful increase in performance. For a platform that processes spend data at our scale, not paying for idle database compute was a meaningful operational improvement.

Ease of adoption. Lakebase is Postgres-compatible — standard Postgres interfaces, extensions, and tooling work without modification. That meant no application rewrites. Our existing libraries, ORMs, and SQL tools carried over directly. Features like point-in-time restore and flexible OAuth roles added operational safety without new complexity. And because Lakebase lives within the Databricks workspace, there was no new platform to onboard — our engineers were already working in Databricks every day.

What the Architecture Looks Like Now

Lakebase serves as our central Postgres database and single source of truth for the front-end application and our AI infrastructure. The Lakehouse continuously consumes data from Lakebase for analysis and metric computation.

The key architectural shift is directionality. Data flows from Lakebase into the Lakehouse for analytics — one direction, with native Unity Catalog integration handling governance across both layers. The platform auto-discovers and surfaces Databricks Metric Views, so computed metrics are immediately available without additional pipeline work.

The rest of the stack: Vercel for hosting, WorkOS for auth, Databricks for everything data. We consolidated where consolidation made sense and kept the pieces that were already working well.

What this eliminated was the sync tax — the custom ETL, the cron jobs, the change-detection logic that existed purely to bridge the gap between two systems that stored overlapping data. Databricks' one-click Delta Lake sync replaced infrastructure that our team had built and maintained by hand. One governance model across OLTP and analytics, rather than managing access controls in two places.

What Changed for Customers

The immediate effects were on the platform side: faster data pipelines, lower operational overhead, and engineering bandwidth freed up for product development instead of infrastructure maintenance.

For customers, that translated to a more responsive platform. Cost data, commitment recommendations, and anomaly alerts surface faster when there's less plumbing between computation and delivery. As we add more data sources, services, and commitment types, the architecture gives us room to scale without adding the same synchronization overhead.

It also made our AI capabilities more practical to operate. ML models that inform commitment purchasing decisions can now work against the same data layer that serves the application, rather than reading from a snapshot that's minutes behind. For a platform that makes automated purchasing decisions every hour, that tighter feedback loop matters.

What We Learned Along the Way

The sync tax is easy to undercount. When sync jobs work, nobody thinks about them. But if you add up the engineering time spent building, monitoring, debugging, and updating pipelines that existed purely to move data between two systems, eliminating that work frees up a meaningful amount of capacity.

Postgres compatibility made the timeline realistic. If Lakebase had required a proprietary query interface or major application rewrites, the project would have been more involved. Our engineers didn’t need retraining, and most of the application code carried over as-is. That made the transition faster and less disruptive than it otherwise would have been.

Auto-scaling changes how you think about database provisioning. We'd been in the habit of provisioning for peak load and accepting the idle cost as a given. Lakebase's scale-to-zero model forced us to rethink that assumption. For a cloud cost optimization company, having our own infrastructure reflect the efficiency principles we sell to customers felt like the right alignment.

Consolidation has compounding benefits. One governance model, one workspace, one set of access controls. Each simplification is small on its own, but they compound — fewer context switches for engineers, fewer integration points to audit, fewer places where a configuration drift can cause a production issue.

About nOps

At nOps, what we ship is cloud cost optimization — automated commitment management and cost visibility across AWS, GCP, and Azure. You can book a free savings analysis with one of our FinOps experts to try it out with your own environment.

nOps was recently ranked #1 with five stars in G2’s cloud cost management category, and we optimize $4+ billion in cloud spend for our customers.