What is Data Lakehouse?
Arockia Liborious, Principal Consultant at Clariant, defines lakehouse as a two-tier architecture that combines features of traditional architectures like warehouse and lake. According to him, companies today have products which use AI in the form of computer vision, voice models, text mining, and others. In contrast, the earlier architectures relied only on structured data to make business decisions. S&P Global’s Matt Aslett similarly writes that data lakehouse “blurs the lines between data lakes and data warehousing by maintaining the cost and flexibility advantages of persisting data in cloud storage while enabling schema to be enforced for curated subsets of data in specific conceptual zones of the data lake, or an associated analytic database, in order to accelerate analysis and business decision-making.”
What is Data Mesh?
Data Mesh, on the other hand, is considered to be a “paradigm shift” in the data science industry. Under a cleverly put title, ‘From data mess to data mesh’, Jarvin Mutatiina and Ernst Blaauw from Deloitte explained that the growing number of data sources and the simultaneous need for agility call for an effective data platform more than the traditional ones. According to them, data mesh is a “democratized approach of managing data where different business domains operationalize their own data, backed by a central and self-service data infrastructure”. It is believed to be more of an organizational approach than a technical one.
A simple comparison
Data Mesh versus Data Lakehouse
Data Mesh founder Zhamak Dehghani, while speaking at the Data+AI Summit, said, “I don’t really see them [Lakehouse and Mesh] exclusive, I see them as complementary”.
In the same breath, we find organisations such as Cloudera employing a hybrid/multi-cloud model in modern data architectures. For example, Luke Roquet of Cloudera writes, “Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability.”
However, several others have called the buzz around data mesh merely a marketing gimmick.
“A ‘data mesh’ is just the net you use to catch data fish in your data lake out of the data boat you keep at your data lakehouse.” - Jim Crist-Harif
Bill Schmarzo likewise asserts that one of the biggest issues he finds with the data mesh architecture is that it necessitates making everyone a data management and data governance expert. And as many have pointed out, the inclusion of data management and governance at different business units in an organisation also incurs high costs and time.
Sawyer Nyquist at Microsoft, in an attempt to “clear out the noise and marketing hype” around these architectures, says that for 95%+ companies, a data warehouse or data lakehouse is the right solution, and only the top 5% of the largest companies in the world need to worry about data mesh.
As of now, experts have touted different responses to the question of ‘lakehouse versus mesh’. Some suppose a hybrid model, whereas others are more cautious about using mesh as the data platform in small- and medium-scale organisations.