24 posts tagged with "Apache Iceberg"

Apache Polaris: How Policy-Managed Table Maintenance Eliminates Iceberg Operational Overhead

February 16, 2026 · 12 min read

Platform Engineering Team

Apache Polaris: Policy-Managed Iceberg Table Maintenance

In our previous post, we covered how to control Iceberg file sizes at write time and how to fix small file problems with Iceberg's table maintenance procedures. The conclusion was clear: the tools are excellent, but manually scheduling and managing maintenance across dozens or hundreds of tables does not scale.

This post is about the layer that solves that problem: Apache Polaris — the open-source Iceberg catalog that introduces policy-based table maintenance, letting you define optimization rules once and have them applied automatically across your entire lakehouse.

Mastering Iceberg File Sizes: How Spark Write Controls and Table Optimization Prevent the Small File Nightmare

February 15, 2026 · 13 min read

Cazpian Engineering

Platform Engineering Team

Mastering Iceberg File Sizes: Spark Write Controls and Table Optimization

Every data engineer who has worked with Apache Iceberg at scale has hit the same wall: query performance that mysteriously degrades over time. The dashboards that used to load in two seconds now take twenty. The Spark jobs that processed in minutes now crawl for an hour. The root cause, almost always, is the same — thousands of tiny files have silently accumulated in your Iceberg tables.

The small file problem is not unique to Iceberg. But Iceberg gives you an unusually powerful set of tools to prevent it at the write layer and fix it at the maintenance layer. The catch is that most teams never configure these controls properly — or do not even know they exist.

Why Every Data Company Is Betting on Apache Iceberg — And What It Means for AI

February 14, 2026 · 13 min read

Cazpian Engineering

Platform Engineering Team

Why Every Data Company Is Betting on Apache Iceberg

Something unusual is happening in the data industry. Companies that have spent years — and billions of dollars — building proprietary storage formats are now rallying behind an open-source table format created at Netflix. Snowflake, Databricks, Dremio, Starburst, Teradata, Google BigQuery, AWS — the list keeps growing. They are not just adding Iceberg as a checkbox feature. They are making it central to their platform strategy.

If you are a data engineer, you have almost certainly heard of Apache Iceberg by now. But the more interesting question is not what Iceberg is — it is why every major vendor has decided that their own proprietary format is no longer enough.

Introducing Cazpian: An AWS-first Lakehouse Platform

December 25, 2025 · One min read

Introducing Cazpian: An AWS-first Lakehouse Platform

We are excited to announce Cazpian, a new kind of data platform built from the ground up for AWS.

In today's world, data teams face a constant struggle: how to manage massive amounts of data without getting bogged down by infrastructure complexity. Cazpian solves this by combining the power of Apache Iceberg and Apache Spark into a seamless, managed experience.