Mar 25, 2025

Confluent Cloud for Apache Flink - Exploring the API

Confluent Cloud for Apache Flink gives you access to run Flink workloads using a serverless platform on Confluent Cloud. After poking around the Confluent Cloud API for configuring connectors I wanted to take a look at the same for Flink.

Using the API is useful particularly if you want to script a deployment, or automate a bulk operation that might be tiresome to do otherwise. It’s also handy if you just prefer living in the CLI :)

Mar 24, 2025

Interesting links - March 2025

The problem with publishing February’s interesting links at the beginning of the month and now getting around to publishing March’s at the end is that I have nearly two months' worth of links to share 😅 So with no further ado, let’s crack on.

Mar 21, 2025

How to create Carousel posts in LinkedIn…without the bullshit

tl;dr: Upload a PDF document in which each slide of the carousel is one page.

I wanted to post a Carousel post in LinkedIn, but had to wade through a million pages of crap in Google from companies trying to sell shit. Here’s how to do it simply.

Mar 20, 2025

Building a data pipeline with DuckDB

In this blog post I’m going to explore how as a data engineer in the field today I might go about putting together a rudimentary data pipeline. I’ll take some operational data, and wrangle it into a form that makes it easily pliable for analytics work.

After a somewhat fevered and nightmarish period during which people walked around declaring "Schema on Read" was the future, that "Data is the new oil", and "Look at the size of my big data", the path that is history in IT is somewhat coming back on itself to a more sensible approach to things.

As they say:

What’s old is new

This is good news for me, because I am old and what I knew then is 'new' now ;)

Mar 19, 2025

Exporting Notebooks from DuckDB UI

DuckDB added a very cool UI last week and I’ve been using it as my primary interface to DuckDB since.

One thing that bothered me was that the SQL I was writing in the notebooks wasn’t exportable. Since DuckDB uses DuckDB in the background for storing notebooks, getting the SQL out is easy enough.

Mar 14, 2025

Kicking the tyres on the new DuckDB UI

I wrote a couple of weeks ago about using DuckDB and Rill Data to explore a new data source that I’m working with. I wanted to understand the data’s structure and distribution of values, as well as how different entities related. This week DuckDB 1.2.1 was released and that little 0.0.1 version boost brought with it the DuckDB UI.

Here I’ll go through the same process as I did before, and see how much of what I was doing can be done in DuckDB alone now.

Mar 13, 2025

Creating an HTTP Source connector on Confluent Cloud from the CLI

In this blog article I’ll show you how you can use the confluent CLI to set up a Kafka cluster on Confluent Cloud, the necessary API keys, and then a managed connector. The connector I’m setting up is the HTTP Source (v2) connector. It’s part of a pipeline that I’m working on to pull in a feed of data from the UK Environment Agency for processing. The data is spread across three endpoints, and one of the nice features of the HTTP Source (v2) connector is that one connector can pull data from more than one endpoint.

Mar 13, 2025

Why is kcat showing the wrong topics?

Much as I love kcat (🤫 it’ll always be kafkacat to me…), this morning I nearly fell out with it 👇

😖 I thought I was going stir crazy, after listing topics on a broker and seeing topics from a different broker.

😵 WTF 😵

Mar 11, 2025

Write more blog articles, not fewer (Don’t leave the scraps on the cutting floor)

Some would say that the perfect blog article takes the reader on a journey on in which the development process looks like this:

Mar 10, 2025

Data Wrangling with Flink SQL

The UK Government publishes a lot of its data as open feeds. One that I keep coming back to is the Environment Agency’s flood-monitoring API that gives access to an estate of sensors that provide information about data such as river levels and rainfall.

The data is well-structured and provided across three primary API endpoints. In this blog article I’m going to show you how I use Flink SQL to explore and wrangle these into the kind of form from which I am then going to build a streaming pipeline using them.

rmoff’s random ramblings

✨ Data Engineering, Kafka, and other random geekery 🤓

Confluent Cloud for Apache Flink - Exploring the API

Interesting links - March 2025

How to create Carousel posts in LinkedIn…without the bullshit

Building a data pipeline with DuckDB

Exporting Notebooks from DuckDB UI

Kicking the tyres on the new DuckDB UI

Creating an HTTP Source connector on Confluent Cloud from the CLI

Why is kcat showing the wrong topics?

Write more blog articles, not fewer (Don’t leave the scraps on the cutting floor)

Data Wrangling with Flink SQL