Snowflake is a now 10-year-old cloud technology that helps companies get their data under control. But with so many data technologies on the market, what makes Snowflake different?
From day one, Snowflake has distinguished itself from the vast majority of other databases by its full-SaaS, zero-management approach, and a few key technologies such as Data Sharing, micro-partitions, zero-clone copy, and decoupling of processing and I/O management. We will detail some of these features here.
What is Snowflake?
Snowflake was founded in California in 2012 and publishes and distributes the eponymous Data Cloud platform, which was officially launched in October 2014.
Designed for and available only on the Cloud, the Snowflake platform allows us to manage very large volumes of data while reducing to the extreme the technological skills required.
On September 16, 2020, just 8 years after its creation, the company enters the Nasdaq and becomes one of the largest IPO in the software world. The Snowflake platform is now used by more than 6,300 customers worldwide and has a SaaS NRR (Net Retention Rate) of nearly 174% in recent years.
Where did Snowflake come from?
The two founders of Snowflake were French engineers working at Oracle as architects on traditional analytical systems. With new big data systems coming to the market, they quickly shared the same frustration and an obvious limitation of the so-called classical database architectures than existing facing new workloads.
They decided to create a brand new data platform from scratch and sensing the impact of cloud computing on the industry in 2012, they decided that their database would only work in the cloud. They abstain from both the numerous constraints and hardware variants so that they will be able to focus on the so-called “high” layers of their software.
After two years of R&D, Snowflake became available in 2014 on Microsoft’s Azure platform, as Cloud Computing starts to make headlines.
What are the Top 4 Features of Snowflake?
As mentioned above, there are several key features that help distinguish Snowflake from other data technologies on the market. Serverless technology, micro-partitions, separation of I/O and processing, and multi-cloud functionality all give Snowflake its competitive advantage. Let’s take a look at those features in more detail:
1. A Serverless technology
Snowflake is a so-called “Serverless” technology, as all the necessary computing resources (computing, storage, network) are managed and provided by Snowflake on demand.
2. Micro-partitions
One of the major features of Snowflake that allows it to reach high-performance levels while dealing with one of the major issues of Cloud platforms is its micro-partitioning system. This system stores vertically (aka per column or group of columns) in the form of a reduced partition of the data as well as a set of metadata (value range, number of distinct values, and other information allowing to accelerate queries and processing).
All tables are automatically stored with this micro-partition system and partitioned as they are fed.
3. Separation of I/O & Processing
The second feature to know about Snowflake is its architecture which allows it to separate data access from data processing.
Thus, it is possible through different types of processors (Virtual Warehouse) to answer a virtually infinite number of requests in parallel, whatever the nature of the processing. Virtual warehouses do not require any administration, only the choice of a power which is named like the size of a t-shirt: from XS to 12XL.
It is on this last characteristic that Snowflake differs fundamentally from the traditional architectures which must treat within the same server network/processing/memory management.
For these reasons, Snowflake falls into the category of so-called “Cloud Native” solutions.
4. Multi-cloud
Snowflake is available on the three major cloud platforms:
Snowflake works in the same way whatever the cloud platform and allows since 2021 to synchronize in real-time and transparent environments deployed on two different cloud providers.
What are the best-known use cases for Snowflake?
Snowflake’s use cases are very numerous and continue to evolve at a significant rate: Analytical Applications, Data lake, Data warehouse, Data Science, Data Applications, in batch, micro-batch or real-time mode.
1. Analytical Applications
The initial architecture of Snowflake is based on data storage and processing via SQL. It is therefore natural to find analytical applications as the most widespread use case.
Loading a very large volume of data, transforming the data and providing a set of tables and views for an advanced reporting or analytics tool.
The technology outlined in the previous chapter allows users to store, process and analyze virtually unlimited amounts of data. Indeed, the hardware limits inherent in other databases are largely pushed back, and it is not uncommon to divide response times by multiples of 10.
2. Data Lake / Data Warehouse
The very low cost of data storage, the absence of data tiering, within Snowflake and its capacity to process large volumes of data allow it to implement these two use cases on the same platform which are often handled within different platforms, using different technologies.
3. Data Science
The implementation of Data Science use cases can be done through different approaches:
- Snowpark, an external code execution environment to process data stored in Snowflake (Scala, Java, Python)
- Use of an external platform highly integrated with Snowflake (like Dataiku or Datarobot)
- Model execution from inside the platform via the use of external User-Defined Functions (UDF)
4. Data Sharing
Very quickly, Snowflake proposed a Data Sharing service that allows two Snowflake deployments from two different organizations to securely share data in real-time within the same Snowflake platform, without having to deal with costly and complex to maintain inter-organization data exchange flows.
Data can be shared publicly and monetized within the Data Marketplace.
5. Real-Time
Real-time is available both when ingesting data via the Snowpipe feature that allows data to be integrated as soon as its presence is detected on a cloud file storage system as well as in the form of an API that allows data present within the platform to be queried in REST. A Kafka connector is also available.
Partnerships & Investment
In order to support the development of its ecosystem, Snowflake has incepted a dedicated subsidiary (Snowflake Ventures) allowing it to invest in external solutions by strengthening the integration on its platform and guaranteeing their common customers a longer-term relationship.
This is notably the case for 20% of companies: Alation, Collibra, Dataiku, DataRobot, DataOps.Live, dbtLabs, ThoughtSpot, and many others.
Take Part in the Devoteam Community
To see what our community of tech leaders said about the current position of Snowflake in the market, take a look at the most recent edition of the Devoteam TechRadar. If you found this article helpful, please take a look at some of our other expert views.