Understanding the Parquet file format
Published: September 27, 2021
Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.