My Work
This is a list of some of my selected piece of writings & talks. The first section highlights my writings and the second part my talks.
How Z-ordering in Apache Iceberg helps improve performance in Cloud Data Lakes
Z-ordering is a multi-dimensional sorting algorithm that helps in efficient search of data. In this blog, I highlight the need for Z-ordering in Apache Iceberg to achieve significant performance gains when dealing with huge amount of data.
Machine Learning Experimentation & Reproducibility in Data Lakehouse using Arctic/Nessie
Model experimentation and reproducibility are two critical aspects in the Machine learning world. In this hands-on blog, I go over how to achieve these two things in a Data Lakehouse using a data-as-code approach with Arctic/Project Nessie.
Build a Data Lakehouse using everything open-source: Apache Iceberg, Project Nessie & Apache Spark
This is a project with the Docker image to build and run a functional minimalistic data lakehouse architecture using everything open-source. I use Apache Iceberg as a table format on top of Amazon S3 with Nessie as a catalog and Apache Spark as the compute engine.