When we built OCIFinOps, one of the most critical architectural decisions was choosing the analytics engine. We needed something that could handle millions of cost report records, support complex analytical queries, and run without infrastructure overhead. DuckDB was the clear winner.
What is DuckDB?
DuckDB is an in-process analytical database management system. Think of it as "SQLite for analytics" — it runs embedded within your application, requires zero configuration, and excels at analytical workloads (aggregations, joins, window functions).
Why DuckDB for FinOps?
1. Speed
Cloud cost data is naturally analytical: you're constantly grouping by service, summing costs, calculating trends over time, and comparing periods. DuckDB is specifically optimized for these operations using columnar storage and vectorized query execution.
In our benchmarks, aggregating 12 months of cost data (~10 million rows) across all dimensions takes less than 200ms. The same query on a row-based database like PostgreSQL takes 3-5 seconds.
2. Parquet Native
OCI cost reports come as CSVs, which we convert to Parquet files. DuckDB can query Parquet files directly — no loading step required. This means:
•Minimal storage overhead (Parquet is highly compressed)
•Instant availability of new data (just add the Parquet file)
•Easy archival (Parquet files are self-describing)
3. Zero Infrastructure
Unlike PostgreSQL, ClickHouse, or BigQuery, DuckDB requires no server. It runs embedded in our Python backend process. This means:
•No database server to manage
•No connection pooling
•No scaling concerns for single-tenant deployments
•Simplified deployment (it's just a pip package)
4. Full SQL Support
DuckDB supports full SQL including window functions, CTEs, and complex joins. This is critical for our text-to-SQL feature — the AI generates standard SQL, and DuckDB executes it without modification.
5. Concurrent Reads
While DuckDB is primarily designed for single-writer workloads, it supports concurrent reads efficiently. For a FinOps dashboard where multiple users query the same data, this works perfectly.
Architecture
Our data pipeline looks like this:
1. Ingest: Pull CSV reports from OCI Object Storage
2. Transform: Convert to Parquet with appropriate data types and partitioning
3. Load: Register Parquet files with DuckDB
4. Query: Serve dashboard queries and NLQ requests via DuckDB SQL
The entire pipeline from raw CSV to queryable data takes seconds, not minutes. And because Parquet files are immutable and self-describing, we get natural versioning and can easily reprocess if needed.
Trade-offs
DuckDB isn't perfect for every use case:
Not designed for high-concurrency OLTP: If you need thousands of concurrent writes, use PostgreSQL
Single-node: No distributed computing. For petabyte-scale data, consider ClickHouse or BigQuery
Memory: Complex queries on very large datasets can require significant memory
For FinOps analytics with data volumes in the millions-to-low-billions of rows, DuckDB hits the sweet spot of performance, simplicity, and zero operational overhead.