Jul 22, 2025

Connecting Apache Flink SQL to Confluent Cloud Kafka broker

This is a quick blog post to remind me how to connect Apache Flink to a Kafka topic on Confluent Cloud. You may wonder why you’d want to do this, given that Confluent Cloud for Apache Flink is a much easier way to run Flink SQL. But, for whatever reason, you’re here and you want to understand the necessary incantations to get this connectivity to work.

Jul 18, 2025

Interesting links - July 2025

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Jul 14, 2025

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Iceberg nicely decouples storage from ingest and query (yay!). When we say "decouples" it’s a fancy way of saying "doesn’t do". Which, in the case of ingest and query, is really powerful. It means that we can store data in an open format, populated by one or more tools, and queried by the same, or other tools. Iceberg gets to be very opinionated and optimised around what it was built for (storing tabular data in a flexible way that can be efficiently queried). This is amazing!

But, what Iceberg doesn’t do is any housekeeping on its data and metadata. This means that getting data in and out of Apache Iceberg isn’t where the story stops.

Jul 4, 2025

Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog

Without wanting to mix my temperature metaphors, Iceberg is the new hawtness, and getting data into it from other places is a common task. I wrote previously about using Flink SQL to do this, and today I’m going to look at doing the same using Kafka Connect.

Kafka Connect can send data to Iceberg from any Kafka topic. The source Kafka topic(s) can be populated by a Kafka Connect source connector (such as Debezium), or a regular application producing directly to it.

Jun 27, 2025

Interesting links - June 2025

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

Jun 24, 2025

Writing to Apache Iceberg on S3 using Flink SQL with Glue catalog

In this blog post I’ll show how you can use Flink SQL to write to Iceberg on S3, storing metadata about the Iceberg tables in the AWS Glue Data Catalog. First off, I’ll walk through the dependencies and a simple smoke-test, and then put it into practice using it to write data from a Kafka topic to Iceberg.

Jun 2, 2025

Digging into Ducklake

After a week’s holiday ("vacation", for y’all in the US) without a glance at anything work-related, what joy to return and find that the DuckDB folk have been busy, not only with the recent 1.3.0 DuckDB release, but also a brand new project called DuckLake.

Here are my brief notes on DuckLake.

May 23, 2025

Interesting links - May 2025

Not got time for all this? I’ve marked 🔥 for my top reads of the month :)

May 20, 2025

Exploring Joins and Changelogs in Flink SQL

SQL. Three simple letters. Ess Queue Ell. /ˌɛs kjuː ˈɛl/.

In the data world they bind us together, yet separate us.

As the saying goes, England and America are two countries divided by the same language, and the same goes for the batch and streaming world and some elements of SQL.

May 2, 2025

🏃🚶 The unofficial Current London 2025 Run/Walk 🏃🚶

Another year, another Current—another 5k run/walk for anyone who’d like to join!

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓

Connecting Apache Flink SQL to Confluent Cloud Kafka broker

Interesting links - July 2025

Keeping your Data Lakehouse in Order: Table Maintenance in Apache Iceberg

Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog

Interesting links - June 2025

Writing to Apache Iceberg on S3 using Flink SQL with Glue catalog

Digging into Ducklake

Interesting links - May 2025

Exploring Joins and Changelogs in Flink SQL

🏃🚶 The unofficial Current London 2025 Run/Walk 🏃🚶