Data Engineering: Resources
As I’ve been reading and exploring the current world of data engineering I’ve been adding links to my Raindrop.io collection, so check that out. In addition, below are some specific resources that I’d recommend.
As I’ve been reading and exploring the current world of data engineering I’ve been adding links to my Raindrop.io collection, so check that out. In addition, below are some specific resources that I’d recommend.
In this article I look at where we store our analytical data, how we organise it, and how we enable access to it. I’m considering here potentially large volumes of data for access throughout an organisation. I’m not looking at data stores that are used for specific purposes (caches, low-latency analytics, graph etc).
The article is part of a series in which I explore the world of data engineering in 2022 and how it has changed from when I started my career in data warehousing 20+ years ago. Read the introduction for more context and background.
For the past 5.5 years I’ve been head-down in the exciting area of stream processing and events, and I realised recently that the world of data and analytics that I worked in up to 2017 which was changing significantly back then (Big Data, y’all!) has evolved and, dare I say it, matured somewhat - and I’ve not necessarily kept up with it. In this series of posts you can follow along as I start to reacquaint myself with where it’s got to these days.
Airtable is a rather wonderful tool. It powers the program creation backend process for Kafka Summit and Current. It does, however, have a few frustrating limitations - often where it feels like a feature was built on a Friday afternoon and they didn’t get chance to finish it before knocking off to head to the pub.
If you’ve ever been to a conference, particularly as a speaker whose submitted a paper that may or may not have been accepted, you might wonder quite how conferences choose the talks that get accepted.
I had the privilege of chairing the program committee for Current and Kafka Summit this year and curating the final program for both. Here’s a glimpse behind the curtains of how we built the program for Current 2022. It was originally posted as a thread on Twitter.
(src)
Lightning talks are generally 5-10 minutes. As the name implies - they are quick!
A good lightning talk is not just your breakout talk condensed into a shorter time frame. You can’t simply deliver the same material faster, or the same material at a higher level, or the same material with a few bits left out
Building the program for any conference is not an easy task. There will always be a speaker disappointed that their talk didn’t get in—or perhaps an audience who are disappointed that a particular talk did get in. As the chair of the program committee for Current 22 one of the things that I’ve found really useful in building out the program this time round are the comments that the program committee left against submissions as they reviewed them.
There were some common patterns I saw, and I thought it would be useful to share these here. Perhaps you’re an aspiring conference speaker looking to understand what mistakes to avoid. Maybe you’re an existing speaker whose abstracts don’t get accepted as often as you’d like. Or perhaps you’re just curious as to what goes on behind the curtains :)
I’m convinced that a developer advocate can be effective remotely. As a profession, we’ve all spent two years figuring out how to do just that. Some of it worked out great. Some of it, less so.
I made the decision during COVID to stop travelling as part of my role as a developer advocate. In this article, I talk about my experience with different areas of advocacy done remotely.
I recently started writing an abstract for a conference later this year and realised that I’m not even sure if I want to do it. Not the conference—it’s a great one—but just the whole up on stage doing a talk thing. I can’t work out if this is just nerves from the amount of time off the stage, or something more fundamental to deal with.
This blog is written in Asciidoc, built using Hugo, and hosted on GitHub Pages. I recently wanted to share the draft of a post I was writing with someone and ended up exporting a local preview to a PDF - not a great workflow! This blog post shows you how to create an automagic hosted preview of any draft content on Hugo using GitHub Actions.
This is useful for previewing and sharing one’s own content, but also for making good use of GitHub as a collaborative platform - if someone reviews and amends your PR the post gets updated in the preview too.