Iceberg on AWS: S3FileIO, Glue Catalog, and Performance Optimization Guide
If you are running Apache Iceberg on AWS, the single most impactful configuration decision you will make is your choice of FileIO implementation. Most teams start with HadoopFileIO and s3a:// paths because that is what their existing Hadoop-based stack already uses. It works, but it leaves significant performance on the table.
Iceberg's native S3FileIO was built from the ground up for object storage. It uses the AWS SDK v2 directly, skips the Hadoop filesystem abstraction entirely, and implements optimizations that s3a cannot — progressive multipart uploads, native bulk deletes, and zero serialization overhead. Teams that switch typically see faster writes, faster commits, and lower memory usage across the board.
This post covers everything you need to run Iceberg on AWS efficiently: why S3FileIO outperforms s3a, how to configure every critical property, how to avoid S3 throttling, how to set up Glue catalog correctly, and how to secure your tables with encryption and credential vending.