Trino on Ice: Using Iceberg To Replace the Hive Table Format

Speakers:  Brian Olsen

Brian Olsen - Percona Live 2021 - Altinity Community Track

Trino (formerly PrestoSQL) is a ludicrously fast query engine that evolved from the need to replace the slow query turnaround speeds of the Hive engine. Trino grew in popularity under the label of Presto for years as an interactive query engine that lives over your data lake. While this operation was certainly a step away from the initial big data days of waiting hours to days for queries to complete, there were still many tedious rules engineers had to follow in order to correctly create, manage, and use their data in the datalake due to the Hive table format.

Apache Iceberg, a table format created at Netix, aims to address many of these issues. Iceberg simplifies the life of the engineer by decoupling the logical view of the data from the physical layout of the data using techniques like hidden partitioning and allows for in-place schema migration of your tables. Iceberg also increases the speed at which you can query your system by tracking files at the le level versus the partition level and so much more. 2 of 3 Speakers

With the marrying of Trino and Iceberg, companies can take advantage of a full replacement of the big data days of old and move into the next generation of datalakes that simplify the mental load of their data engineers and focus on building out the business logic and other tasks. In this talk I will cover some of the examples of the issues Iceberg solves from the lens of Trino.

Video

Link: https://youtu.be/5-Q74rCX2Z8

Speakers

Brian Olsen

Starburst, Developer Advocate

Brian is a U.S. Marine turned software engineer and developer advocate working to foster the open-source Trino community. Brian spent four years as a data engineer at a cybersecurity company working on pipeline maintenance and query optimization. While in this role, Brian was responsible for maintaining data pipelines and migrations to include replacing some legacy data warehousing systems to use open-source Trino. Brian is a published author in ACM and IEEE geospatial database conferences.

See all talks by Brian Olsen »

✎ Edit this page on GitHub