Engineering2024-11-159 min read

Automating OCI Cost Report Ingestion with Python: A Developer's Guide

Learn how to programmatically download, parse, and analyze OCI cost reports using Python, the OCI SDK, and modern data tools.

OT

OCIFinOps Team

If you want to build custom cost analysis pipelines or understand how OCIFinOps works under the hood, this guide walks through the technical process of ingesting OCI cost reports programmatically.

Prerequisites

You'll need:

An OCI tenancy with cost reports enabled

Python 3.9+

The OCI Python SDK

DuckDB (for analysis)

Step 1: Configure OCI Access

OCI cost reports are stored in a special Object Storage bucket managed by Oracle. You need the tenancy OCID and proper IAM policies.

import oci

config = oci.config.from_file() # Uses ~/.oci/config

object_storage = oci.object_storage.ObjectStorageClient(config)

# Cost report bucket details

namespace = "bling" # Oracle's namespace for cost reports

bucket_name = config["tenancy"] # Bucket name = tenancy OCID

Step 2: List Available Reports

def list_cost_reports(object_storage, namespace, bucket_name):

"""List all available cost report files."""

reports = []

next_start = None

while True:

response = object_storage.list_objects(

namespace,

bucket_name,

prefix="reports/cost-csv",

start=next_start,

fields="name,size,timeCreated"

)

reports.extend(response.data.objects)

if response.data.next_start_with:

next_start = response.data.next_start_with

else:

break

return reports

Step 3: Download Reports

import gzip

from pathlib import Path

def download_report(object_storage, namespace, bucket_name, report_name, output_dir):

"""Download and decompress a cost report."""

response = object_storage.get_object(namespace, bucket_name, report_name)

output_path = Path(output_dir) / Path(report_name).name

gz_path = output_path.with_suffix('.csv.gz')

# Save compressed file

with open(gz_path, 'wb') as f:

for chunk in response.data.raw.stream(1024 * 1024):

f.write(chunk)

# Decompress

csv_path = output_path.with_suffix('.csv')

with gzip.open(gz_path, 'rb') as f_in:

with open(csv_path, 'wb') as f_out:

f_out.write(f_in.read())

gz_path.unlink() # Clean up compressed file

return csv_path

Step 4: Convert to Parquet

CSV files are inefficient for analytical queries. Converting to Parquet provides compression and columnar access.

import duckdb

def csv_to_parquet(csv_path, parquet_path):

"""Convert a cost report CSV to Parquet format."""

conn = duckdb.connect()

conn.execute(f"""

COPY (

SELECT * FROM read_csv_auto('{csv_path}',

header=true,

normalize_names=true

)

) TO '{parquet_path}' (FORMAT PARQUET, COMPRESSION ZSTD)

""")

conn.close()

Step 5: Query with DuckDB

def analyze_costs(parquet_dir):

"""Run analytical queries on cost data."""

conn = duckdb.connect()

# Total cost by service

result = conn.execute(f"""

SELECT

product_service,

SUM(cost_my_cost) as total_cost,

COUNT(*) as line_items

FROM read_parquet('{parquet_dir}/*.parquet')

WHERE cost_my_cost > 0

GROUP BY product_service

ORDER BY total_cost DESC

LIMIT 10

""").fetchdf()

print(result)

conn.close()

return result

Step 6: Put It All Together

def main():

config = oci.config.from_file()

client = oci.object_storage.ObjectStorageClient(config)

namespace = "bling"

bucket = config["tenancy"]

output_dir = "./cost_data"

parquet_dir = "./cost_parquet"

Path(output_dir).mkdir(exist_ok=True)

Path(parquet_dir).mkdir(exist_ok=True)

# List and download reports

reports = list_cost_reports(client, namespace, bucket)

for report in reports[-30:]: # Last 30 reports

csv_path = download_report(client, namespace, bucket,

report.name, output_dir)

parquet_path = Path(parquet_dir) / csv_path.stem + ".parquet"

csv_to_parquet(csv_path, str(parquet_path))

csv_path.unlink() # Clean up CSV

# Analyze

analyze_costs(parquet_dir)

if __name__ == "__main__":

main()

What OCIFinOps Adds

This script covers the basics, but OCIFinOps adds significant value on top:

Incremental ingestion: Only downloads new reports, avoiding reprocessing

Schema evolution: Handles changes in cost report format over time

Deduplication: OCI occasionally produces overlapping reports

AI-powered queries: Natural language interface instead of writing SQL

Anomaly detection: Statistical analysis running automatically

Interactive dashboards: Visual exploration without code

Understanding the underlying pipeline helps you appreciate what OCIFinOps automates — and gives you the knowledge to customize if needed.

Ready to optimize your OCI costs?

Start with a free demo and see how OCIFinOps can help.