Building Better Docs - Automating Jekyll Builds and Link Checking for PRs
One of the most important ways that a project can help its developers is providing them good documentation. Actually, scratch that. Great documentation.
One of the most important ways that a project can help its developers is providing them good documentation. Actually, scratch that. Great documentation.
java.lang.ClassNotFoundException: delta.DefaultSource
No great insights in this post, just something for folk who Google this error after me and don’t want to waste three hours chasing their tails… 😄
Here’s a neat little trick you can use with DuckDB to convert a CSV file into a Parquet file:
COPY (SELECT *
FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
It all started with a tweet.
What do you do when you want to query over multiple parquet files but the schemas don’t quite line up? Let’s find out 👇🏻
As we enter December and 2022 draws to a close, so does a significant chapter in my working career—later this month I’ll be leaving Confluent and onto pastures new.
It’s nearly six years since I wrote a 'moving on' blog entry, and as well as sharing what I’ll be working on next (and why), I also want to reflect on how much I’ve benefited from my time at Confluent and particularly the people with whom I worked.
In my quest to bring myself up to date with where the data & analytics engineering world is at nowadays, I’m going to build on my exploration of the storage and access technologies and look at the tools we use for loading and transforming data.
I started my dbt journey by poking and pulling at the pre-built jaffle_shop demo running with DuckDB as its data store. Now I want to see if I can put it to use myself to wrangle the session feedback data that came in from Current 2022. I’ve analysed this already, but it struck me that a particular part of it would benefit from some tidying up - and be a good excuse to see what it’s like using dbt to do so.
I’ve been wanting to try out dbt for some time now, and a recent long-haul flight seemed like the obvious opportunity to do so. Except many of the tutorials with dbt that I found were based on using Cloud, and airplane WiFi is generally sucky or non-existant. Then I found the DuckDB-based demo of dbt, which seemed to fit the bill (🦆 geddit?!) perfectly, since DuckDB runs locally. In addition, DuckDB had appeared on my radar recently and I was keen to check it out.
At Current 2022 the audience was given the option to submit ratings. Here’s some analysis I’ve done on the raw data. It’s interesting to poke about it, and it also gave me an excuse to try using DuckDB in a notebook!