Snowflake as Data Stack - Is This The Right Place?
In previous articles (here and here), Marcello and Miguel from the Data Engineering team showcased the capabilities of Snowflake’s Snowpark Container Services. This is a powerful feature, as it allows the running of containerised serverless jobs within Snowflake. The possibilities are vast: think running data ingestion scripts, triggering transformation jobs, sending reports, or doing any reverse ETL operations all within the Snowflake ecosystem.
In practice, a simple, compact but complete Data Stack could be built in Snowflake without the need for an additional Cloud Provider or 3rd party data ingestion tools. All data is kept within Snowflake. This includes not only running jobs but also orchestration (Snowflake Tasks), logging (Snowflake Event Table), alerting (native email notification), and monitoring/observability (Streamlit dashboards).
Some of the key benefits are:
- Quick time to value: the setup of a single tool is faster since all services are fully integrated. There is more time to focus on the activities that yield business value and less on the technical and infrastructure setup.
- Narrower required skillset: a simpler data stack that can be supported with a small or part-time data team, without requiring extensive experience working with infrastructure and multiple cloud providers.
- Operational efficiency: having data centralised in a single place means less data movement and decreased overhead, as there is only one system to manage;
- Admin efficiency: there are fewer contractual and legal chores to handle, and it is easier to monitor budget/costs.
There are some trade-offs to running a Data Stack exclusively in Snowflake. Some disadvantages are:
- Vendor lock-in: Implementing a non-open-source solution increases vulnerability to unexpected changes in product offerings or pricing. Moreover, it creates significant friction if you decide to switch solutions in the future.
- Basic functionality: while Snowflake is renowned for its excellent Data Warehousing capabilities, it does not necessarily offer the best feature in every single category. For example, when it comes to orchestration, Snowflake provides basic functionality when compared to available orchestrators in the market, such as Dagster, Prefect, Orchestra etc.
It’s worth noting that long-running Snowflake Container Services can host open-source solutions, allowing you to choose the best tools for specific tasks. For example, you could use Meltano or Airbyte for data ingestion, and Dagster or Prefect for orchestration – all within Snowflake!
Considering these tradeoffs, there are some organisations that would benefit from such simplified solution, such as:
- Startups in need of quick insights to validate their business model
- Scaleups with limited (or non-existing) data teams or engineering resources
A new article will be coming out soon, showcasing a small but detailed implementation with code samples. Stay tuned! 👀