Databricks open sources Unity Catalog, unifying data & AI governance
Databricks has announced it is open sourcing Unity Catalog, which it describes as the industry's only unified solution for data and artificial intelligence (AI) governance across multiple clouds, data formats, and platforms. This step builds on Databricks' emphasis on open ecosystems, offering customers greater flexibility and control without the risk of vendor lock-in. The initiative receives support from several major technology companies including Amazon Web Services (AWS), Google Cloud, Microsoft, NVIDIA, and Salesforce.
Unity Catalog OSS features a universal interface that supports various data formats and compute engines, including the ability to read tables using Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm. It also accommodates Iceberg REST Catalog and Hive Metastore (HMS) interface standards. This platform aims to provide unified governance for tabular and non-tabular data, as well as AI assets like machine learning (ML) models and generative AI tools, thus simplifying management at scale.
Unity Catalog was initially introduced in 2021 to address the need for an interoperable catalogue for data and AI workloads. Traditionally, organisations relied on numerous single-purpose solutions, which led to the creation of silos between platforms and data and AI assets. These silos complicated the development of modern data and AI applications, presenting challenges in managing metadata, data access, and governance. According to Databricks, Unity Catalog has effectively broken down these silos for over 10,000 organisations.
"Our customers love Unity Catalog. It lets them manage all their data objects—tabular data, unstructured data, and AI and ML assets—in a single source of truth within the Databricks Data Intelligence Platform, versus gluing together multiple single-purpose solutions," said Ali Ghodsi, Co-founder and CEO at Databricks. "Our platform is the only major data platform in the industry where all data is in an open format by default—now, metadata and governance are open as well, giving enterprises the governance solution they need in today's data and AI landscape. We're excited to open source Unity Catalog and release the code. We'll continue to evolve the open standard in close collaboration with our partners."
Unity Catalog OSS stands out in the industry as it supports an extensive range of cloud platforms including Microsoft Azure, AWS, Google Cloud, and Salesforce; compute engines such as Apache Spark, Presto, Trino, DuckDB, Daft, PuppyGraph, and StarRocks; as well as various data and AI platforms including dbt Labs, Confluent, Eventual, Fivetran, Granica, Immuta, Informatica, LanceDB, LangChain, Tecton, and Unstructured. The platform also features open APIs, increasing its flexibility and customer choice by enabling broad interoperability across different engines, tools, and platforms.
Leading figures from supporting organisations expressed their enthusiasm about Unity Catalog OSS. "AWS welcomes Databricks' move to open source Unity Catalog," said Chris Grusz, Managing Director of Technology Partnerships at AWS. "AWS is committed to working with the industry on open source solutions that enable choice and interoperability for customers."
Ritika Suri, Director of Data and AI Technology Partnerships at Google Cloud, noted, "Google is committed to open, flexible solutions that empower customers to maximise the value of their data. Databricks' strategy to open up the Unity Catalog standard for data and AI aligns very well with our strategy." Jessica Hawk, Corporate Vice President for Data, AI, Digital Applications at Microsoft, also added, "Microsoft is committed to the open-source community and empowering customers with choice. Databricks has been a strategic partner for years and it's great to see them open-sourcing Unity Catalog."
Other industry leaders praise the open-sourcing move as well. Matt Dugan, VP of Data Platforms at AT&T, expressed optimism: "The announcement of Unity Catalog's open sourcing encourages us that lakehouse governance and metadata management through open standards will be feasible." Jason Shiverick, Director of AI Platforms at Rivian, stated, "We are excited about Databricks open sourcing Unity Catalog and releasing open APIs to bring interoperability across our data landscape without any concerns of vendor lock-in."