Checkpoint Chronicle - February 2024

Published by in Apache Flink at https://preview.rmoff.net/2024/02/22/checkpoint-chronicle-february-2024/

Note
This post originally appeared on the Decodable blog.

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases šŸ”—

Event Streaming šŸ”—

Change Data Capture šŸ”—

Data Platforms and Architecture šŸ”—

Data Ecosystem šŸ”—

  • The Modern Data Stack is a moniker that’s been ubiquitous for several years now and one to which any data tool vendor worth its salt would try to hitch its wagon. That is, until last week, when Tristan Handy at dbt wondered out loud whether the term "Modern Data Stack" [is Still a Useful Idea?] And thus spawning a series of response articles from names synonymous with the space including from Joe Reiss and Benn Stancil .

  • DocStore is a distributed database built at Uber, offering strong consistency, caching with Redis, CDC—and the ability to serve over 40 million reads per second .

  • Part of my fun with Flink catalogs (that I mention above) was reacquainting myself with the Hive Metastore. My former colleague Oz Katz has a good article exploring the options in this space now and looking at how some of the new ones aren’t entirely open, or have elements of vendor lock-in.

  • Real time analytics is a hot space with many active projects and vendors. Whilst both Vimeo and Lyft have embraced ClickHouse (moving from Apache Phoenix on HBase and Apache Druid respectively), Uber uses Apache Pinot at scale.

  • Daniel Beach is a data engineer at Rippleshot and prolific blogger. A few of his articles that I’ve enjoyed recently are Config Driven Pipelines and Are Data Contracts For Real? and Batch vs Near-Realtime vs Streaming

Papers of the Month šŸ”—

Murat Demirbas has a fascinating blog in which he analyses papers that have been published. Two papers that caught my eye recently are:

Events & Call for Papers (CfP) šŸ”—

New Releases šŸ”—

There are also a couple of releases that are almost there but not quite at the time of going to press šŸ™‚

  • flink-connector-jdbc-3.1.2 RC3 vote has passed, and so the release is imminent (this will add support for Flink 1.18 to the connector)

  • Apache Kafka 3.7 RC4 vote is underway. This release includes a bunch new stuff such as a Docker image for Kafka ( KIP-975 ), Kafka Connect supporting the creation of connectors in a stopped state ( KIP-980 ), and in Kafka Streams support for rack aware task assignment ( KIP-925 ) plus a bunch of improvements to Interactive Queries v2 ( KIP-968 , KIP-985 , KIP-992 )

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Gunnar ( LinkedIn / X / Mastodon / Email )
Robin ( LinkedIn / X / Mastodon / Email )


TABLE OF CONTENTS