Some of my favourite public data sets

Published by in Datasets, Sample Data at https://preview.rmoff.net/2020/09/25/some-of-my-favourite-public-data-sets/

Readers of a certain age and RDBMS background will probably remember northwind, or HR, or OE databases - or quite possibly not just remember them but still be using them. Hardcoded sample data is fine, and it’s great for repeatable tutorials and examples - but it’s boring as heck if you want to build an example with something that isn’t using the same data set for the 100th time.

I’ve written before about one of my favourite resources for mocking data, Mockaroo, and how you can even use it to stream mock data into Kafka. Other mock data generators for Kafka include kafka-connect-datagen and Voluble.

Sometimes though, you just want some real, live, warts-and-all data. And there is fortunately a real shift in governments and public bodies in recent years to Open data. Here is a list of some of my (UK-centric) resources. Many have a mix of live and static datasets.


What are your go-to sources for real data? Let me know and I’ll add them to this list.