Running dbt in Snowpark - A Step-by-Step Guide
In previous articles, we explored how Snowpark Container Services can open doors to a complete data stack running solely on Snowflake (here) and showcased all essential tools Snowflake provides to achieve this (here). Now, it’s time to dive into the practical side of things. This article will guide you through a step-by-step implementation of running dbt in Snowpark Container Services, covering everything from setup and containerisation all the way to scheduling and monitoring. If you’re trying to create a simple containerised dbt setup, this guide will help you put all theory into action!
Pre-requisites
Ideally, you are familiar with Snowflake, dbt and git, and you are comfortable as well using terminal commands.
In order to follow along you will need:
- Snowflake Account with
ACCOUNTADMIN
permissions - Git
- Python
- Docker
- Optional: Snowflake CLI
We will use the existing sandbox dbt repository jaffle-shop, which contains some raw data and models. You will be asked to clone it later on. All the code and resources showcased in this article is captured in a public git repository: tasman-snowpark-dbt.
A note about dbt and Snowflake OAuth
As of today, the dbt-snowflake library latest release (1.8.3) still does not officially support OAuth authentication, which is necessary for Snowpark to work. However, the Pre-Release version 1.9.0b1 includes OAuth support, so this is what we are going to use for this tutorial. Do have this in mind in case you plan to implement a Production setup.
Step-by-Step Implementation
Setting up Snowflake objects for dbt
We want to start by creating an isolated Snowflake setup for our implementation. Here are the SQL commands to be executed within Snowflake directly:
Database and Schemas
This is the main database that will contain the materialised dbt models, as well as all Snowpark related artifacts.
use role SYSADMIN ;
-- Database
create database DBT_SNOWPARK_DB ;
-- Schemas
use database DBT_SNOWPARK_DB ;
create schema JAFFLE_SHOP_LOCAL ; -- for local dbt runs
create schema JAFFLE_SHOP_SNOWPARK ; -- for Snowpark dbt runs
create schema RAW ; -- will be useful for storing raw data
Virtual Warehouse
Next up, create a Virtual Warehouse to run dbt and materialise its models.
use role SYSADMIN ;
create or replace warehouse DBT_SNOWPARK_WH
warehouse_size = 'XSMALL'
auto_suspend = 60
;
Custom Role
And now, a new custom role with appropriate permissions:
use role SECURITYADMIN;
create role DBT_SNOWPARK_ROLE ;
grant role DBT_SNOWPARK_ROLE to role SYSADMIN ; -- good practice
-- Role Grants on Database
grant USAGE, CREATE SCHEMA on database DBT_SNOWPARK_DB to role DBT_SNOWPARK_ROLE ;
-- Role Grants on dbt Schemas
grant all privileges on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK to role DBT_SNOWPARK_ROLE ;
grant all privileges on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_LOCAL to role DBT_SNOWPARK_ROLE ;
grant all privileges on schema DBT_SNOWPARK_DB.RAW to role DBT_SNOWPARK_ROLE ;
-- Role Grants on Warehouse
grant usage, operate on warehouse DBT_SNOWPARK_WH to role DBT_SNOWPARK_ROLE ;
Grant the new role to yourself:
use role SECURITYADMIN ;
grant role DBT_SNOWPARK_ROLE to user <YOUR_USER> ;
ℹ️
In a real-life scenario, you could create a Role and a Virtual Warehouse to use exclusively in local/ dev runs, and a Role and Warehouse only for Snowpark runs. This would provide better isolation of compute credit consumption.
For the sake of simplicity, in this tutorial we are using the same resources for everything.
Setting up a working dbt repository
Create a working directory and clone the repo
Start by creating a new working directory on your machine, and clone the jaffle-shop
dbt repository into it.
mkdir dbt-snowpark
cd dbt-snowpark
git clone git@github.com:dbt-labs/jaffle-shop.git
Create a virtual env and install dbt
Create a virtual environment called venv
using the venv package, and activate it in your terminal. Run these commands in your root directory dbt-snowpark
:
python -m venv venv
source venv/bin/activate
Install dbt-snowflake
in your environment:
pip install git+https://github.com/dbt-labs/dbt-snowflake.git@v1.9.0b1
The command above is the same pip install dbt-snowflake
with the sole difference that it is pointing to a specific repository tag (v1.9.0b1).
To confirm all is installed correctly, check for the dbt version in your terminal. Run the command below:
dbt --version
You should have a response somewhat similar to this:
Core:
- installed: 1.8.7
- latest: 1.8.7 - Up to date!
Plugins:
- snowflake: 1.8.3 - Up to date!
Create a local dbt target
Store your credentials on your local profiles.yml
file. Open the file ~/.dbt/profiles.yml
(or equivalent if you are not using Mac/Linux), and add the following block:
jaffle_shop:
target: local
outputs:
local:
type: snowflake
account: <YOUR_SNOWFLAKE_ACCOUNT> # Example: ab12345.eu-west-1.aws
user: <YOUR_SNOWFLAKE_USER>
password: <YOUR_SNOWFLAKE_PASSWORD> # In case you use basic Auth
database: DBT_SNOWPARK_DB
schema: JAFFLE_SHOP_LOCAL
role: DBT_SNOWPARK_ROLE
warehouse: DBT_SNOWPARK_WH
threads: 8
client_session_keep_alive: False
authenticator: username_password_mfa # In case you have MFA enabled
🔐
Notice theauthenticator
line in the block above? We are using MFA and you should too! It is strongly recommended to use MFA in all Snowflake user accounts.
Adjust the dbt project file
Before we can run dbt, we need to make small tweaks to the dbt_project.yml
file.
- Ensure the source data from the project is loaded to Snowflake. This data is stored as CSV files in folder
jaffle-shop/jaffle-data
. We need to tell dbt to scan this directory for seeds. To achieve this, addjaffle-data
to the existingseed-paths
. - Update the
profile
attribute tojaffle_shop
. This will ensure the profile is mapped to the credentials you added in your localprofiles.yml
.
After the changes, your file should look like this:
Run dbt locally
In case you already closed your terminal, then open a new one and activate your venv
, navigate to the folder jaffle-shop
, and run dbt debug
.
source venv/bin/activate
cd jaffle-shop
dbt debug
You need to point dbt on the local profile you created, so for that you need to export a special dbt environment variable:
export DBT_PROFILES_DIR=~/.dbt/
This is the path to the directory that contains the profiles.yml
file in which the local
profile was added.
Now you can run dbt debug:
dbt debug
You should see something like this:
12:45:37 Running with dbt=1.8.7
(...)
12:45:37 - git [OK found]
(...)
12:45:38 Connection test: [OK connection ok]
12:45:38 All checks passed!
Run dbt deps
to install all dependencies:
dbt deps
And finally, run dbt build
, which is a command that encapsulates dbt seed
, dbt run
and dbt test
.
dbt build --target local
ℹ️
Thetarget
flag added to dbt is not needed, as it would default to the valuelocal
. We are referencing it here to make it more explicit.
By the end you should be able to see the following objects in Snowflake:
dbt models
raw data from seeds
And voilà! We have a working dbt repository 🎉
Containerise the dbt repository
The next step is to prepare a dbt Docker image for Snowpark to use.
Note about Snowpark Authentication to Snowflake
This is perhaps the most tricky part of the whole guide. To run dbt using Snowpark, we need to ensure Snowpark is able to connect to Snowflake using a special Service User that only exists while the container is running. And for that, we need to understand how this user will authenticate.
According to Snowflake’s documentation, the only recommended method for Snowpark to authenticate to Snowflake is via OAuth.
Whenever a new Snowpark Container is run, it includes 2 environment variables and an OAuth Token file, for the Snowflake Client to use while connecting. In our case, the client that needs to authenticate is our dbt-snowflake package.
These env vars are SNOWFLAKE_ACCOUNT
and SNOWFLAKE_HOST
. The file called token
is stored in the following location: /snowflake/session/token
. The file contains an OAuth Token that can then be used for the authentication.
⚠️
Using Snowflake’s Basic Auth (user + password) or key-pair is also possible, but do have in mind this is not recommended. OAuth is the only way for Snowpark to access the Snowflake objects while remaining in Snowflake’s internal network. Using other Auth methods will require Snowpark to access the Snowflake account via the internet. To achieve this, additional networking configurations in Snowpark are necessary, and since the connection is made via the Internet, is less secure.
Create a snowpark dbt target
So far we used a local
target using our own accounts, but now we want Snowpark to authenticate, and additionally we want to use a different schema for our Snowpark runs.
This time, we are going to create a new file in the root of our dbt project called profiles.yml
, and add the new target.
jaffle_shop:
outputs:
snowpark:
type: snowflake
authenticator: oauth
host: "{{ env_var('SNOWFLAKE_HOST') }}"
account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
token: "{{ env_var('SNOWFLAKE_TOKEN') }}"
database: "DBT_SNOWPARK_DB"
schema: "JAFFLE_SHOP_SNOWPARK"
role: "DBT_SNOWPARK_ROLE"
warehouse: "DBT_SNOWPARK_WH"
threads: 8
As you can see in the block, the host, account and token are expected as environment variables. But you might have also noticed that Snowpark provides only SNOWFLAKE_HOST
and SNOWFLAKE_ACCOUNT
as environment variables, not the Token… How to fix this?
Create a Docker Entrypoint Python File
The fact that the OAuth Token is stored as a file and not as an environment variable makes things a bit more difficult.
The approach we are going to take is to create a Docker entrypoint. In this case, the entrypoint is a Python script which will be executed when running the image. It will include logic to extract that Token from its file. Additionally, this script will help us specify the dbt command we want to execute every time a new Snowpark Container starts.
We start by writing a method to retrieve the Snowflake OAuth Token:
def get_snowflake_oauth_token() -> str:
with open("/snowflake/session/token", "r") as f:
return f.read()
You can then use the method to add the token to the Env Variables:
import os
os.environ["SNOWFLAKE_TOKEN"] = get_snowflake_oauth_token()
Assuming the dbt command is passed in the command arguments, here is how to parse it:
import argparse
def parse_command():
parser = argparse.ArgumentParser()
parser.add_argument("--command", type=str, default="dbt debug")
args = parser.parse_args()
command = args.command
return command
And we can invoke dbt using our our dbt package:
from dbt.cli.main import dbtRunner
def run_dbt_command(command: str) -> None:
dbt = dbtRunner()
cli_args = command.split(" ")
cli_args.remove("dbt") # "dbt" not required to be in the list
result = dbt.invoke(cli_args)
if result.exception:
raise Exception(f"dbt command failed! Error: {result.exception}")
for r in result.result:
logger.info(f"{r.node.name}: {r.status}")
Tidying things up:
import argparse
import logging
import os
import sys
from dbt.cli.main import dbtRunner
logging.basicConfig(stream=sys.stdout, level=logging.INFO) # To enable logs in Snowpark
logger = logging.getLogger(__name__)
def get_snowflake_oauth_token() -> str:
with open("/snowflake/session/token", "r") as f:
return f.read()
def parse_command():
parser = argparse.ArgumentParser()
parser.add_argument("--command", type=str, default="dbt debug")
args = parser.parse_args()
command = args.command
return command
def run_dbt_command(command: str) -> None:
dbt = dbtRunner()
cli_args = command.split(" ")
cli_args.remove("dbt") # "dbt" not required to be in the list
result = dbt.invoke(cli_args)
if result.exception:
raise Exception(f"dbt command failed! Error: {result.exception}")
for r in result.result:
logger.info(f"{r.node.name}: {r.status}")
def main():
os.environ["SNOWFLAKE_TOKEN"] = get_snowflake_oauth_token()
dbt_command = parse_command()
logger.info(f"Running command '{dbt_command}'...")
run_dbt_command(dbt_command)
if __name__ == "__main__":
main()
In summary, the script allows you to supply a dbt command to it and executes it while handling the chores (such as getting the OAuth token).
Example of an entrypoint.py
run:
python entrypoint.py —-command "dbt run --target snowpark"
⚠️
Notice that if you runentrypoint.py
locally, it won’t work! This is because the script was designed only for usage by Snowpark. The code assumes the Snowflake Env Vars are loaded, and that the Snowflake Token file exists in/snowflake/session/token
which is not the case in our local setup.
Add a requirements.txt file to the repo
This will help the containerisation process by providing the required Python dependencies.
Create a file called requirements.txt
in your root folder, and include the dbt-snowflake
package:
git+https://github.com/dbt-labs/dbt-snowflake.git@v1.9.0b1
If you notice, there is also a requirements.txt
file living in the folder of the dbt project. This file won’t be used so you can delete it.
ℹ️
Consider using a packaging and dependency management tool instead of creating a requirements file manually! Tasman loves Poetry and uses it in most Python projects. It was not used in this tutorial to keep things simple.
Create a Dockerfile
With an entrypoint file and a dbt profile already created, the next step is to create a Dockerfile. Create a file Dockerfile
in the root folder:
FROM python:3.11-slim
WORKDIR /usr
ENV DBT_PROJECT_DIR=/usr/dbt
ENV DBT_PROFILES_DIR=$DBT_PROJECT_DIR
ENV DBT_PROJECT_DIR_NAME=jaffle-shop
ENV PATH="/usr/.venv/bin:$PATH"
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY ${DBT_PROJECT_DIR_NAME} ${DBT_PROJECT_DIR}
WORKDIR ${DBT_PROJECT_DIR}
RUN dbt deps
ENTRYPOINT ["python", "/usr/dbt/entrypoint.py" ]
In a nutshell, what this Dockerfile does:
- Uses the
python:3.11-slim
image, which is a light version of the original Python image - Copies the requirements file, installs dependencies using
pip
- Copies the dbt project
- Sets the image entrypoint to the
entrypoint.py
script we added to the repo
To build the docker image, you can run the following command:
docker buildx build -t dbt-snowpark .
Now, as stated before, you can try running a container based on this image, but this won’t work because the image is designed to work only with Snowpark. The Snowpark variables and token file are not in place locally. If you want to give it a try:
docker run --rm dbt-snowpark
The error you will have should be something like this:
FileNotFoundError: [Errno 2] No such file or directory: '/snowflake/session/token'
As expected. But once you run this using Snowpark, this won’t happen.
Setting up Snowflake objects for Snowpark
Now that we have a Docker image ready to use, we have to create the necessary components in Snowflake for Snowpark to be able to run the container.
Compute Pool
A Compute Pool is a collection of one or more Virtual Machine (VM) nodes in which Snowflake runs your Snowpark Containers. In this case, this will provide the computing power to be used by the dbt job.
Let’s start by creating the Compute Pool. The creation can only be done by ACCOUNTADMIN
, or alternatively by a custom role with create pool on account
grants).
Here is how to do it:
use role ACCOUNTADMIN ;
create compute pool if not exists DBT_SNOWPARK_COMPUTE_POOL
MIN_NODES = 1
MAX_NODES = 1
INSTANCE_FAMILY = CPU_X64_XS
;
You can now check the details on the resource:
describe compute pool DBT_SNOWPARK_COMPUTE_POOL;
Finally, we want to ensure our custom role can use the Compute Pool we just created.
use role SECURITYADMIN ; grant USAGE, OPERATE on compute pool DBT_SNOWPARK_COMPUTE_POOL to role DBT_SNOWPARK_ROLE ;
Image Repository
The next step is to create an Image Registry in Snowflake. This is where we will upload the Docker image we created to.
The Image Registry lives inside a database schema. In this case, we previously created a database schema called JAFFLE_SHOP_SNOWPARK
which is supposed to keep all Snowpark dbt related artifacts. The Docker Image Registry will fit quite well there.
This is how to do it:
use role DBT_SNOWPARK_ROLE ;
create image repository DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK.IMAGE_REPOSITORY ;
Confirm it was successfully created by running this:
show image repositories in schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK ;
The query result will contain a value repository_url
which should look similar to this:
xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository
Take note of this value as we will use it in next steps.
Push Docker image to Snowpark
Start by getting the Registry Name. This is the first part of the repository_url
above noted. It should look like this: xxxx-xxxx.registry.snowflakecomputing.com
.
The docker login command looks like this:
docker login xxxxx-xxxxx.registry.snowflakecomputing.com -u <YOUR_USER>
Run the command and type your password when prompted.
ℹ️
If you have MFA enabled on your account, you might notice that the login process might be a bit buggy! Think about bypassing the MFA on your account for a couple of minutes if you are only following this tutorial, and think about using a Service Account for this in a production scenario.
Tag the Docker image with the repository_url
value, followed by the image name:
docker tag dbt-snowpark xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
And push:
docker push xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
You can run the following query to confirm the image has been pushed:
select SYSTEM$REGISTRY_LIST_IMAGES('/dbt_snowpark_db/jaffle_shop_snowpark/image_repository');
It should return something like this:
{"images":["dbt-snowpark"]}
Voilà! Your image is now in Snowflake 🎉
Running containerised dbt in Snowpark
Execute the Service
This is the moment of truth! Let’s trigger a containerised Snowpark dbt run. For that, we are going to run the following SQL command:
use role DBT_SNOWPARK_ROLE ;
execute job service
in compute pool DBT_SNOWPARK_COMPUTE_POOL
name = 'dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job'
from specification $$
spec:
containers:
- name: dbt-job
image: xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
command: [ "python", "/usr/dbt/entrypoint.py", "--command", "dbt build --target snowpark" ]
$$
;
Once finished, you should be able to see your raw data and models in JAFFLE_SHOP_SNOWPARK
schema! 🎉
ℹ️
Theexecute job service
command is synchronous. Once you trigger it, the query only finishes once the container exits (successfully or not).
Under the Hood
The command execute job service
will spin up the Docker container according to the instructions in the specification
block, and the query finishes once the container exits (regardless of success or failure).
The service and local container will still be available for some minutes (in our experience between 30 and 60 minutes), after which Snowflake performs a cleanup by dropping the service. You can access the details by running the following query:
show service containers in service dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;
You can also access the logs produced by the container using this query (already parsed):
-- Returns the last 100 log lines
with service_logs as (
select system$get_service_logs('dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job', 0, 'dbt-job', 100) as logs
)
select value as log
from service_logs , lateral split_to_table(logs, '\\n')
;
In this case, the query output is already familiar to anyone who used dbt before:
If you want to run the job a second time and the cleanup hasn’t finished, you won’t be able to do it! So what you can do is force the cleanup yourself, by running a drop service
statement:
use role DBT_SNOWPARK_ROLE ;
drop service if exists dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;
After this, you will be able to trigger the job again.
ℹ️
In a real-life scenario, you might want to always run adrop service if exists
statement before executing your service.
Additional Considerations
- The Snowflake user executing the service is a special Service User that exists only while the service is running, and using the role
DBT_SNOWPARK_ROLE
. - The dbt service is being executed by the compute pool we initially created.
- The SQL models that triggered by dbt are run by the special Service User as well, under role
DBT_SNOWPARK_ROLE
as well, and using the virtual warehouseDBT_SNOWPARK_WH
.- This is all specified in the
profiles.yml
file that we created before. - Your own Snowflake User is allowed to run all dbt generated queries by authenticating using the OAuth token and Env Vars that were set up in the
entrypoint.py
file, and assuming the roleDBT_SNOWPARK_ROLE
.
- This is all specified in the
Logs, Scheduling, Error Notifications
Now that we have a way for Snowpark to execute dbt, we can address additional functionalities to make it production-ready.
Access Persistent Logs
You can use an Event Table to access all logs even after the services are dropped.
The following query will give you all logs from past jobs:
use role ACCOUNTADMIN ;
select TIMESTAMP, VALUE, RESOURCE_ATTRIBUTES
from <YOUR_EVENT_TABLE>
where RESOURCE_ATTRIBUTES:"snow.service.name" = 'DBT_SNOWPARK_JOB'
and VALUE != ''
order by timestamp desc
limit 100;
As an idea, this SQL logic could be captured in a view to simplify its usage.
The column RESOURCE_ATTRIBUTES
also contains more details on each Snowpark Container run. It is specially useful to filter by snow.service.container.run.id
to isolate logs from a single run:
use role ACCOUNTADMIN ;
select TIMESTAMP, VALUE, RESOURCE_ATTRIBUTES
from <YOUR_EVENT_TABLE>
where RESOURCE_ATTRIBUTES:"snow.service.container.run.id" = '3609f3'
and VALUE != ''
order by timestamp desc
limit 100;
Schedule Runs with Error Notifications
Enabling scheduled runs and trigger error notifications, all within Snowflake, closes the loop on the work we have done so far. For this, we are going to use a combination of Snowflake Procedures, Tasks, and Notification Integration.
Let’s start by creating a Notification Integration to allow Snowflake to send emails to a set of users.
use role ACCOUNTADMIN ;
create notification integration dbt_snowpark_failure_notification_integration
type = EMAIL
enabled = TRUE
allowed_recipients = ('email1@email.test','user2@email.test')
;
You can test sending emails using the following procedure:
call SYSTEM$SEND_EMAIL(
'dbt_snowpark_failure_notification_integration', -- Notification Integration
'email1@email.test, email2@email.test', -- Recipient List
'Test Subject', -- Email Subject
'Test Email Body' -- Email Content
);
💡
The email addresses used in the Notification Integration need to be verified within the Snowflake Console! In case you want to send emails to an email that does not belong to a user (e.g. a mailing list, or a Slack channel email address), consider creating a Snowflake user solely for that purpose.
Now, let’s ensure our custom role has enough permissions to send emails:
use role ACCOUNTADMIN ;
grant usage on integration dbt_snowpark_failure_notification_integration to role DBT_SNOWPARK_ROLE ;
With email notifications ready, now let’s create a Snowflake Procedure to trigger the job.
use role DBT_SNOWPARK_ROLE ;
create or replace procedure run_dbt_snowpark()
returns string
language sql
execute as caller
as
declare
error_message STRING ;
begin
-- 1. Clean Service if needed
drop service if exists
dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;
-- 2. Execute Job
execute job service
in compute pool DBT_SNOWPARK_COMPUTE_POOL
name = 'dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job'
from specification $$
spec:
containers:
- name: dbt-job
image: xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
command: [ "python", "/usr/dbt/entrypoint.py", "--command", "dbt build --target snowpark" ]
$$;
return 'Snowpark dbt job completed successfully!' ;
exception
-- Send email when the job fails
when OTHER then
error_message := 'Snowpark dbt Run failed.\\n' || 'SQL Code: ' || SQLCODE || '\\n' || 'Error Message: ' || SQLERRM || '\\n' || 'SQL State: ' || SQLSTATE || '\\nPlease check the Events Table to access the dbt logs.';
call SYSTEM$SEND_EMAIL(
'dbt_snowpark_failure_notification_integration',
'email1@email.test, email2@email.test',
'Snowpark dbt job Failure',
:error_message
);
return error_message;
end;
What does this Procedure do?
- Performs a Service cleanup (if exists)
- Executes the Snowpark Service Job
In case any of the above steps fail, an email is sent to the provided recipients, letting them know that the job has failed, and includes some error information:
You can test a manual run of this procedure, by running:
use role DBT_SNOWPARK_ROLE ;
call run_dbt_snowpark() ;
The only part missing now is scheduling! And for that, we are going to use a Snowflake Task.
Let’s start by granting the necessary permissions to our custom role:
use role ACCOUNTADMIN ;
grant EXECUTE TASK on account to role DBT_SNOWPARK_ROLE ;
use role SECURITYADMIN ;
grant CREATE TASK, EXECUTE TASK on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK to role DBT_SNOWPARK_ROLE ;
And now let’s create our task:
use role DBT_SNOWPARK_ROLE ;
create or replace task DBT_SNOWPARK_DAILY_RUN_TASK
warehouse = DBT_SNOWPARK_WH
schedule = 'USING CRON 0 7 * * * Europe/Amsterdam'
AS
call run_dbt_snowpark()
;
In this case, the task will trigger the procedure every day at 7AM using the CET Amsterdam Timezone. You can now adapt the CRON expression according to the desired frequency and also create as many Tasks as required! You can check the CRON GURU website to learn more about CRON expressions.
Bonus
Executing the Job using Snowflake CLI
The job can also be triggered ah-hoc using a terminal/shell script. This can be quite useful if scheduling the job from a 3rd party tool (think orchestrators like Airflow, Dagster, Prefect), or running jobs in a CI/CD pipeline (e.g. for testing purposes).
To do this, you need to create a yaml spec file. You can use the contents of the specification
block in the execute job service
query and save it in a yaml file.
For this case, a file called snowpark_dbt_spec.yaml
was created in the root folder of the repository:
ℹ️
As you can see, you can use different spec files for different jobs (e.g. one for dbt runs and one for dbt tests, or different specs for different run frequencies).
Then all you need to do is run the following command:
snow spcs service execute-job \\
dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job \\
--compute-pool DBT_SNOWPARK_COMPUTE_POOL \\
--spec-path snowpark_dbt_spec.yaml \\
--role DBT_SNOWPARK_ROLE \\
--connection <YOUR_CONNECTION>
And you can do the cleanup as well:
snow spcs service drop \\
dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job \\
--role DBT_SNOWPARK_ROLE \\
--connection <YOUR_CONNECTION>
Conclusion
In this tutorial, we covered how to set up a sample dbt project, run it locally, containerise it and run it in Snowpark. We finish off with scheduling a job, logging, and alerting in case of failures.
A lot more could be done to improve the setup! An idea is to take the concept of Snowflake as a single solution even further is to leverage the native Streamlit Apps to generate operational dashboards, cost monitoring, and even serving HTML content such as dbt docs and elementary report.
Since Snowpark is also still relatively new, the whole ecosystem around it is still not fully developed, which might be limiting in some ways. For example, you still cannot use Infrastructure-as-Code to create Snowpark resources like Compute Pools or Image Registries (though Terraform is currently working on it, so we can count on it soon).
We hope this is useful, and if you have nice Snowpark tips, let us know so we can include them in the article!