Snowflake vs. Databricks: Two Cloud Giants Battling in the AI Domain

The last week of June was a big one in the data and AI world, marking the official entry into the AI platform race by Databricks and Snowflake. With dueling conferences in different cities, each company went on the offensive to demonstrate its technology roadmap enabling enterprises to leverage the power of LLMs and AI. Over the past decade, Snowflake and Databricks have been friend and foe, but last week they made it glaringly obvious they are now arch-competitors in the new battleground of AI.

It should be no surprise that the majority of the discussions and announcements at both conferences involved Generative AI. The central theme was that to have a generative AI strategy, every company has to start with a data strategy. Unsurprisingly, Databricks and Snowflake are each making the case that they are best positioned to assist customers in that journey.

How did two companies that began life at different parts of the value chain — and at one point even enjoyed a strategic partnership — evolve into such fierce competitors in this new age of AI?

Let’s dig in.

[Quick disclaimer: Madrona invested in Snowflake’s Series C and still holds some shares in the company.]

Snowflake vs Databricks featured image
Click to enlarge

Snowflake: Data Warehouse to Data Cloud

Snowflake was founded in 2012 by Benoît Dageville and Thierry Cruanes, two database experts who had previously spent many years at Oracle, where they made the astute observation that most data warehouses were “rigid, expensive and difficult to use.” Dageville and Cruanes teamed up with Marcin Zukowski, former CEO of Vectorwise (now Actian Vector), to build the data warehouse of the future based on three fundamental premises: 1) a fully cloud-based architecture; 2) the separation of compute from storage to allow near limitless scaling; and 3) elasticity in how computing resources are used, resulting in unprecedented speeds in query processing and flexibility.

Today, Snowflake has evolved from “simply” a cloud data warehouse into a “Data Cloud,” a single platform for customers to access, build, collaborate, and monetize their data. In just over a decade, they have grown into a $55B market cap public company servicing 6,000+ customers and much of the Fortune 500. Having muscled its way alongside the major hyperscalers (Azure, AWS, and GCP), Snowflake has now clearly set its vision on gaining more mindshare in artificial intelligence.

To do so, they have made several acquisitions and product launches in AI and ML, including:

  • Snowpark allows data scientists to work with their preferred programming languages to enable end-to-end ML workload development, deployment, and orchestration. Customers can ingest, analyze, and transform their data to train ML models and run more predictive analytics.
  • Streamlit is a data-based app builder that Snowflake acquired for $800M in March 2022, allowing customers to develop data-intensive apps with only a few lines of code. Streamlit simplifies the process of contextualizing data analytics tasks and ML model outputs through front-end web applications.
  • Neeva, which Snowflake acquired earlier this year in a push to accelerate how businesses interact and search with their data, particularly in a more conversational way.
Snowflake v Databricks
Snowflakes “Data Cloud”

Databricks: Building The Lakehouse

Databricks was founded in 2013, just one year after Snowflake. Unlike Snowflake, founded by industry practitioners, Databricks was founded by a group with deep roots in academia and the open-source community. Its seven original cofounders, including current CEO Ali Ghodsi, were researchers at UC Berkeley’s AMP Lab, where they conceived Apache Spark, an open-source unified analytics engine for large-scale data processing. Spark has grown into one of the largest and most used data processing frameworks, executing data engineering, data science, and machine learning at scale.

Databricks was initially formed to commercialize Spark, introducing an enterprise-grade version of Spark with all the features (governance, support, hosting, etc.) that large organizations required. Databricks has since evolved into the novel “Lakehouse Platform,” unifying data, analytics, and AI. The unified Lakehouse concept brings together “one platform for integration, storage, processing, governance, sharing, analytics, and AI.”

In the past ten years, Databricks has become one of the world’s most highly valued private companies, last valued at $38B in 2021 and recently crossing the $1B revenue milestone. They serve thousands of enterprise customers and open-source users and are considered one of the most hotly anticipated IPOs. Throughout all of this growth, they are increasingly positioning themselves as a leader in AI and recently made key acquisitions and announcements, including acquiring MosaicML for $1.3B (covered more below) and open-sourcing Dolly, an instruction-tuned LLM trained for less than $30.

Databricks’ “Lakehouse” Platform
Databricks’ “Lakehouse” Platform

Snowflake vs. Databricks: Colliding in AI

Snowflake and Databricks are well-positioned to continue capitalizing on long-term secular trends as enterprises position for the Generative AI paradigm shift. With the proliferation of generative AI applications, both companies are trying to position themselves as strategic multiproduct data platforms. Below, we highlight a few of the major recent announcements and our thoughts on each company’s overall AI strategy.

Snowflake Major Announcements
  • Developer Announcements
    • Snowflake’s Native App Framework: This is a new way of putting data to work by allowing developers to create, distribute, and monetize applications that can all scale with Snowflake’s Data Cloud.
    • Snowpark Container Services: Extended data programmability and compute infrastructure to support programming languages, access third-party software, and enhanced security and governance for hosting full-stack apps and LLMs. Provides further flexibility by generalizing Snowflake’s compute platform such that customers can run a full stack end-to-end application from the bottom of the stack (data layer) up to the UI layer.
    • Other Notable Announcements: Snowpipe streaming capabilities; Dynamic Tables (also known as Materialized Tables); Document AI (a new service to extract unstructured data within documents); and Iceberg Tables.
  • Partnership Announcements: Snowflake announced several notable partnerships with NVIDIA, Microsoft, and Weights & Biases.
    • With Nvidia, Snowflake plans to embed the company’s NeMo enterprise developer framework into its Data Cloud, allowing Snowflake customers to build and deploy LLMs and AI-driven applications leveraging proprietary data that resides in Snowflake.
    • With Microsoft, Snowflake extending the partnership with Azure to focus on new product integrations around Microsoft Azure’s OpenAI and Azure AI/ML services. The partnership has the potential to increase workloads and customers into the Data Cloud.
    • With Weights & Biases, a leading MLOps platform, Snowflake’s Container Services enables Weights & Biases to accelerate the iterative development of ML models, LLMs, and LLM-powered applications in the Snowflake Data Cloud. Ultimately this partnership will help enterprises and users more easily build and leverage generative AI.
    • Beyond these two, Snowflake announced several other partnerships with companies like Alteryx, Hex, Dataiku, RelationalAI, Pinecone, and more.
Snowflake: Our Take

Until very recently, Snowflake did not reveal any plans for adding generative AI to its existing capabilities, and many investors have expressed concern that Snowflake is being out-competed in this space (particularly by Databricks). However, at the 2023 Summit, Snowflake landed a strong story around their vision to be a platform for generative AI, positioning themselves as the trusted data cloud provider.

Snowflake’s partnership with Nvidia and the announcement of Snowpark Container services help give the company a foothold as a viable player in the AI data stack. Their driving message is they can enable customers to securely access, develop and deploy LLMs and AI-driven applications within the Snowflake Data Cloud while providing access to accelerated computing with Nvidia GPUs and AI software.

Snowflake’s ability to build its position in the market and with customers who entrust the company with their data is evident —and they are building the toolset to be a real competitor in the world of AI.

Snowpark Container Services overview
Snowpark Container Services overview
Databricks Major Announcements
  • Developer Announcements
    • LakehouseIQ: LLM-powered natural language interface for searching and querying data and powerful understanding of customer’s data, internal jargon, and usage patterns to understand customer’s schemas, documents, queries, lineage, and more.
    • LakehouseAI: Databricks announced several new capabilities around Databricks ML, including several LLMOps capabilities like the bringing together of data, the preparation of datasets for ML, fine-tuning and curation of ML models, and the deployment of the models themselves. Databricks also announced several features around vector search, feature service, and MLFlow Gateway.
    • MosaicML: Just before the Summit kickoff, Databricks announced a $1.3B acquisition of MosaicML , which during the Summit was positioned as the “machine to build your GenAI models.”
  • Other Notable Announcements: Delta Lake 3.0, MLFlow 2.5 to support across different backend LLMs, Lakehouse Apps, and Intelligent Monitoring with Databricks Lakehouse Monitoring.
Databricks: Our Take

Databricks has taken a unification approach to AI by bringing together data, AI models, and monitoring and governance capabilities into the Lakehouse platform. As a result, Databricks has enabled customers to develop their GenAI solutions more efficiently, and customers view them as a trusted partner that is, on average, faster, cheaper, and easier to use to facilitate ML development.

While already considered a key player in the AI stack, Databricks has emboldened its position as a leader in GenAI through investments in models like Dolly (an open-source instruction-following LLM) and their big-ticket acquisition of MosaicML. Databricks continues to echo the message that their Lakehouse is the best way for gen-native startups to train and deploy their own AI models, leveraging their proprietary data in a cost-effective manner without being tied to Big Tech.

LakehouseIQ Overview
LakehouseIQ Overview

What to Expect Going Forward

While the generative AI craze has been unabated for 8+ months, recent events clearly signal that Snowflake and Databricks are taking the gloves off to compete for both mind and market share in this space.

So what can we expect from this heightened Snowflake vs. Databricks rivalry moving forward?

  1. Acquisitions will continue → Snowflake and Databricks are both relatively well-positioned to continue acquiring smaller companies that complement their overall strategies. Snowflake has ~$4B of cash on its balance sheet, while Databricks maintains a rich valuation that doubles as usable currency. Meanwhile, hundreds of startups across AI and data tools yearn for an exit in a dry IPO market. We don’t expect Neeva and MosaicML to be the last acquisitions these giants will make, and there will be consolidation.
  2. Customers will benefit → One of the clear winners in the emerging battle between Snowflake and Databricks should be their customers. These giants are rapidly adding new and novel products and services to their platforms, building “one-stop shops” for their customers to build data applications and take advantage of LLMs. This platform augmentation will help democratize access to artificial intelligence and allow data scientists, data engineers, and AI practitioners to collaborate more meaningfully.
  3. Azure and AWS will make even more $$$ → As Snowflake and Databricks continue to push further into owning more of the AI market, they will require massive compute capabilities, primarily served by Azure and AWS, a point data engineer Anant Packidurali astutely observed. Similar to how Nvidia is a secular beneficiary of who “wins” in AI, the hyperscalers that underpin the compute needs for Snowflake and Databricks stand to gain regardless of who emerges victorious in the AI battle.

As enterprises rely more heavily on data to bolster their GenAI strategy, we believe Snowflake and Databricks are well-positioned to take advantage of the generational shift. Although they came from different parts of the value chain, and their relationship has evolved over the past decade, they are now squarely locked in a race with enormous rewards.

We are closely watching how new gen-native startups think about their data, AI, and ML strategy and carefully deciding which data partner they should work with.

Related Insights

    Data Boundaries are Blurring in a Multi-Cloud World
    Data Visionary Bob Muglia on the Modern Data Stack and Lessons from Snowflake
    The Question Every CEO is Asking: What is Our Generative AI Strategy?

Related Insights

    Data Boundaries are Blurring in a Multi-Cloud World
    Data Visionary Bob Muglia on the Modern Data Stack and Lessons from Snowflake
    The Question Every CEO is Asking: What is Our Generative AI Strategy?