My Work

This is a list of some of my selected piece of writings & talks. The first section highlights my writings and the second part my talks.

How Z-ordering in Apache Iceberg helps improve performance in Cloud Data Lakes

Z-ordering is a multi-dimensional sorting algorithm that helps in efficient search of data. In this blog, I highlight the need for Z-ordering in Apache Iceberg to achieve significant performance gains when dealing with huge amount of data.

Machine Learning Experimentation & Reproducibility in Data Lakehouse using Arctic/Nessie

Model experimentation and reproducibility are two critical aspects in the Machine learning world. In this hands-on blog, I go over how to achieve these two things in a Data Lakehouse using a data-as-code approach with Arctic/Project Nessie.

Build a Data Lakehouse using everything open-source: Apache Iceberg, Project Nessie & Apache Spark

This is a project with the Docker image to build and run a functional minimalistic data lakehouse architecture using everything open-source. I use Apache Iceberg as a table format on top of Amazon S3 with Nessie as a catalog and Apache Spark as the compute engine.

Getting Started with Flink SQL and Apache Iceberg

In this blog, I provide a hands-on experience of combining Apache Flink and Apache Iceberg to build real-time data lakehouses.

My Work

How Z-ordering in Apache Iceberg helps improve performance in Cloud Data Lakes

Machine Learning Experimentation & Reproducibility in Data Lakehouse using Arctic/Nessie

Build a Data Lakehouse using everything open-source: Apache Iceberg, Project Nessie & Apache Spark

Getting Started with Flink SQL and Apache Iceberg

Talks