Here’s a neat little trick you can use with DuckDB to convert a CSV file into a Parquet file:
COPY (SELECT *
FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
You can modify the schema too if you want, selecting specific fields and renaming them too if you want:
COPY (SELECT col1, col2, col3 AS foo
FROM read_csv('~/data/source.csv',AUTO_DETECT=TRUE))
TO '~/data/target.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');