In previous articles, we explored how Snowpark Container Services can open doors to a complete data stack running solely on Snowflake (here) and showcased all essential tools Snowflake provides to achieve this (here). Now, it’s time to dive into the practical side of things. This article will guide you through a step-by-step implementation of running dbt in Snowpark Container Services, covering everything from setup and containerisation all the way to scheduling and monitoring. If you’re trying to create a simple containerised dbt setup, this guide will help you put all theory into action!

Pre-requisites

Ideally, you are familiar with Snowflake, dbt and git, and you are comfortable as well using terminal commands.

In order to follow along you will need:

  • Snowflake Account with ACCOUNTADMIN permissions
  • Git
  • Python
  • Docker
  • Optional: Snowflake CLI

We will use the existing sandbox dbt repository jaffle-shop, which contains some raw data and models. You will be asked to clone it later on. All the code and resources showcased in this article is captured in a public git repository: tasman-snowpark-dbt.

A note about dbt and Snowflake OAuth

As of today, the dbt-snowflake library latest release (1.8.3) still does not officially support OAuth authentication, which is necessary for Snowpark to work. However, the Pre-Release version 1.9.0b1 includes OAuth support, so this is what we are going to use for this tutorial. Do have this in mind in case you plan to implement a Production setup.

Step-by-Step Implementation

Setting up Snowflake objects for dbt

We want to start by creating an isolated Snowflake setup for our implementation. Here are the SQL commands to be executed within Snowflake directly:

Database and Schemas

This is the main database that will contain the materialised dbt models, as well as all Snowpark related artifacts.

use role SYSADMIN ;

-- Database
create database DBT_SNOWPARK_DB ;

-- Schemas
use database DBT_SNOWPARK_DB ;

create schema JAFFLE_SHOP_LOCAL ; -- for local dbt runs
create schema JAFFLE_SHOP_SNOWPARK ; -- for Snowpark dbt runs
create schema RAW ; -- will be useful for storing raw data

Virtual Warehouse

Next up, create a Virtual Warehouse to run dbt and materialise its models.

use role SYSADMIN ;
create or replace warehouse DBT_SNOWPARK_WH 
	warehouse_size = 'XSMALL'
	auto_suspend = 60 
;

Custom Role

And now, a new custom role with appropriate permissions:

use role SECURITYADMIN;
create role DBT_SNOWPARK_ROLE ;
grant role DBT_SNOWPARK_ROLE to role SYSADMIN ; -- good practice

-- Role Grants on Database 
grant USAGE, CREATE SCHEMA on database DBT_SNOWPARK_DB to role DBT_SNOWPARK_ROLE ;

-- Role Grants on dbt Schemas
grant all privileges on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK to role DBT_SNOWPARK_ROLE ;
grant all privileges on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_LOCAL to role DBT_SNOWPARK_ROLE ;
grant all privileges on schema DBT_SNOWPARK_DB.RAW to role DBT_SNOWPARK_ROLE ;

-- Role Grants on Warehouse
grant usage, operate on warehouse DBT_SNOWPARK_WH to role DBT_SNOWPARK_ROLE ;

Grant the new role to yourself:

use role SECURITYADMIN ;
grant role DBT_SNOWPARK_ROLE to user <YOUR_USER> ;
ℹ️
In a real-life scenario, you could create a Role and a Virtual Warehouse to use exclusively in local/ dev runs, and a Role and Warehouse only for Snowpark runs. This would provide better isolation of compute credit consumption.
For the sake of simplicity, in this tutorial we are using the same resources for everything.

Setting up a working dbt repository

Create a working directory and clone the repo

Start by creating a new working directory on your machine, and clone the jaffle-shop dbt repository into it.

mkdir dbt-snowpark
cd dbt-snowpark
git clone git@github.com:dbt-labs/jaffle-shop.git

Create a virtual env and install dbt

Create a virtual environment called venv using the venv package, and activate it in your terminal. Run these commands in your root directory dbt-snowpark:

python -m venv venv
source venv/bin/activate

Install dbt-snowflake in your environment:

pip install git+https://github.com/dbt-labs/dbt-snowflake.git@v1.9.0b1
The command above is the same pip install dbt-snowflake with the sole difference that it is pointing to a specific repository tag (v1.9.0b1).

To confirm all is installed correctly, check for the dbt version in your terminal. Run the command below:

dbt --version

You should have a response somewhat similar to this:

Core:
  - installed: 1.8.7
  - latest:    1.8.7 - Up to date!

Plugins:
  - snowflake: 1.8.3 - Up to date!

Create a local dbt target

Store your credentials on your local profiles.yml file. Open the file ~/.dbt/profiles.yml (or equivalent if you are not using Mac/Linux), and add the following block:

jaffle_shop:
  target: local
	outputs:
		local:
			type: snowflake
			account: <YOUR_SNOWFLAKE_ACCOUNT> # Example: ab12345.eu-west-1.aws
			user: <YOUR_SNOWFLAKE_USER>
			password: <YOUR_SNOWFLAKE_PASSWORD> # In case you use basic Auth
			database: DBT_SNOWPARK_DB
			schema: JAFFLE_SHOP_LOCAL
			role: DBT_SNOWPARK_ROLE
			warehouse: DBT_SNOWPARK_WH
			threads: 8
			client_session_keep_alive: False
			authenticator: username_password_mfa # In case you have MFA enabled
🔐
Notice the authenticator line in the block above? We are using MFA and you should too! It is strongly recommended to use MFA in all Snowflake user accounts.

Adjust the dbt project file

Before we can run dbt, we need to make small tweaks to the dbt_project.yml file.

  1. Ensure the source data from the project is loaded to Snowflake. This data is stored as CSV files in folder jaffle-shop/jaffle-data . We need to tell dbt to scan this directory for seeds. To achieve this, add jaffle-data to the existing seed-paths.
  2. Update the profile attribute to jaffle_shop. This will ensure the profile is mapped to the credentials you added in your local profiles.yml.

After the changes, your file should look like this:

Run dbt locally

In case you already closed your terminal, then open a new one and activate your venv, navigate to the folder jaffle-shop, and run dbt debug.

source venv/bin/activate
cd jaffle-shop
dbt debug

You need to point dbt on the local profile you created, so for that you need to export a special dbt environment variable:

export DBT_PROFILES_DIR=~/.dbt/

This is the path to the directory that contains the profiles.yml file in which the local profile was added.

Now you can run dbt debug:

dbt debug

You should see something like this:

12:45:37  Running with dbt=1.8.7
(...)
12:45:37   - git [OK found]
(...)
12:45:38    Connection test: [OK connection ok]

12:45:38  All checks passed!

Run dbt deps to install all dependencies:

dbt deps

And finally, run dbt build , which is a command that encapsulates dbt seed, dbt run and dbt test.

dbt build --target local
ℹ️
The target flag added to dbt is not needed, as it would default to the value local. We are referencing it here to make it more explicit.

By the end you should be able to see the following objects in Snowflake:

dbt models

raw data from seeds

And voilà! We have a working dbt repository 🎉

Containerise the dbt repository

The next step is to prepare a dbt Docker image for Snowpark to use.

Note about Snowpark Authentication to Snowflake

This is perhaps the most tricky part of the whole guide. To run dbt using Snowpark, we need to ensure Snowpark is able to connect to Snowflake using a special Service User that only exists while the container is running. And for that, we need to understand how this user will authenticate.

According to Snowflake’s documentation, the only recommended method for Snowpark to authenticate to Snowflake is via OAuth.

Whenever a new Snowpark Container is run, it includes 2 environment variables and an OAuth Token file, for the Snowflake Client to use while connecting. In our case, the client that needs to authenticate is our dbt-snowflake package.

These env vars are SNOWFLAKE_ACCOUNT and SNOWFLAKE_HOST. The file called token is stored in the following location: /snowflake/session/token. The file contains an OAuth Token that can then be used for the authentication.

⚠️
Using Snowflake’s Basic Auth (user + password) or key-pair is also possible, but do have in mind this is not recommended. OAuth is the only way for Snowpark to access the Snowflake objects while remaining in Snowflake’s internal network. Using other Auth methods will require Snowpark to access the Snowflake account via the internet. To achieve this, additional networking configurations in Snowpark are necessary, and since the connection is made via the Internet, is less secure.

Create a snowpark dbt target

So far we used a local target using our own accounts, but now we want Snowpark to authenticate, and additionally we want to use a different schema for our Snowpark runs.

This time, we are going to create a new file in the root of our dbt project called profiles.yml, and add the new target.

jaffle_shop:
  outputs:
    snowpark:
      type: snowflake
      authenticator: oauth
      host: "{{ env_var('SNOWFLAKE_HOST') }}"
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      token: "{{ env_var('SNOWFLAKE_TOKEN') }}"
      database: "DBT_SNOWPARK_DB"
      schema: "JAFFLE_SHOP_SNOWPARK"
      role: "DBT_SNOWPARK_ROLE"
      warehouse: "DBT_SNOWPARK_WH"
      threads: 8

As you can see in the block, the host, account and token are expected as environment variables. But you might have also noticed that Snowpark provides only SNOWFLAKE_HOST and SNOWFLAKE_ACCOUNT as environment variables, not the Token… How to fix this?

Create a Docker Entrypoint Python File

The fact that the OAuth Token is stored as a file and not as an environment variable makes things a bit more difficult.

The approach we are going to take is to create a Docker entrypoint. In this case, the entrypoint is a Python script which will be executed when running the image. It will include logic to extract that Token from its file. Additionally, this script will help us specify the dbt command we want to execute every time a new Snowpark Container starts.

We start by writing a method to retrieve the Snowflake OAuth Token:

def get_snowflake_oauth_token() -> str:
    with open("/snowflake/session/token", "r") as f:
        return f.read()

You can then use the method to add the token to the Env Variables:

import os

os.environ["SNOWFLAKE_TOKEN"] = get_snowflake_oauth_token()

Assuming the dbt command is passed in the command arguments, here is how to parse it:

import argparse

def parse_command():
    parser = argparse.ArgumentParser()
    parser.add_argument("--command", type=str, default="dbt debug")

    args = parser.parse_args()
    command = args.command

    return command

And we can invoke dbt using our our dbt package:

from dbt.cli.main import dbtRunner

def run_dbt_command(command: str) -> None:
    dbt = dbtRunner()

    cli_args = command.split(" ")
    cli_args.remove("dbt")  # "dbt" not required to be in the list

    result = dbt.invoke(cli_args)

    if result.exception:
        raise Exception(f"dbt command failed! Error: {result.exception}")

    for r in result.result:
        logger.info(f"{r.node.name}: {r.status}")

Tidying things up:

import argparse
import logging
import os
import sys

from dbt.cli.main import dbtRunner

logging.basicConfig(stream=sys.stdout, level=logging.INFO)  # To enable logs in Snowpark
logger = logging.getLogger(__name__)

def get_snowflake_oauth_token() -> str:
    with open("/snowflake/session/token", "r") as f:
        return f.read()

def parse_command():
    parser = argparse.ArgumentParser()
    parser.add_argument("--command", type=str, default="dbt debug")

    args = parser.parse_args()
    command = args.command

    return command

def run_dbt_command(command: str) -> None:
    dbt = dbtRunner()

    cli_args = command.split(" ")
    cli_args.remove("dbt")  # "dbt" not required to be in the list

    result = dbt.invoke(cli_args)

    if result.exception:
        raise Exception(f"dbt command failed! Error: {result.exception}")

    for r in result.result:
        logger.info(f"{r.node.name}: {r.status}")

def main():
    os.environ["SNOWFLAKE_TOKEN"] = get_snowflake_oauth_token()
    dbt_command = parse_command()

    logger.info(f"Running command '{dbt_command}'...")

    run_dbt_command(dbt_command)

if __name__ == "__main__":
    main()

In summary, the script allows you to supply a dbt command to it and executes it while handling the chores (such as getting the OAuth token).

Example of an entrypoint.py run:

python entrypoint.py —-command "dbt run --target snowpark" 
⚠️
Notice that if you run entrypoint.py locally, it won’t work! This is because the script was designed only for usage by Snowpark. The code assumes the Snowflake Env Vars are loaded, and that the Snowflake Token file exists in /snowflake/session/token which is not the case in our local setup.

Add a requirements.txt file to the repo

This will help the containerisation process by providing the required Python dependencies.

Create a file called requirements.txt in your root folder, and include the dbt-snowflake package:

git+https://github.com/dbt-labs/dbt-snowflake.git@v1.9.0b1

If you notice, there is also a requirements.txt file living in the folder of the dbt project. This file won’t be used so you can delete it.

ℹ️
Consider using a packaging and dependency management tool instead of creating a requirements file manually! Tasman loves Poetry and uses it in most Python projects. It was not used in this tutorial to keep things simple.

Create a Dockerfile

With an entrypoint file and a dbt profile already created, the next step is to create a Dockerfile. Create a file Dockerfile in the root folder:

FROM python:3.11-slim

WORKDIR /usr

ENV DBT_PROJECT_DIR=/usr/dbt
ENV DBT_PROFILES_DIR=$DBT_PROJECT_DIR
ENV DBT_PROJECT_DIR_NAME=jaffle-shop
ENV PATH="/usr/.venv/bin:$PATH"

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY ${DBT_PROJECT_DIR_NAME} ${DBT_PROJECT_DIR}

WORKDIR ${DBT_PROJECT_DIR}

RUN dbt deps

ENTRYPOINT ["python", "/usr/dbt/entrypoint.py" ]

In a nutshell, what this Dockerfile does:

  • Uses the python:3.11-slim image, which is a light version of the original Python image
  • Copies the requirements file, installs dependencies using pip
  • Copies the dbt project
  • Sets the image entrypoint to the entrypoint.py script we added to the repo

To build the docker image, you can run the following command:

docker buildx build -t dbt-snowpark .

Now, as stated before, you can try running a container based on this image, but this won’t work because the image is designed to work only with Snowpark. The Snowpark variables and token file are not in place locally. If you want to give it a try:

docker run --rm dbt-snowpark

The error you will have should be something like this:

FileNotFoundError: [Errno 2] No such file or directory: '/snowflake/session/token'

As expected. But once you run this using Snowpark, this won’t happen.

Setting up Snowflake objects for Snowpark

Now that we have a Docker image ready to use, we have to create the necessary components in Snowflake for Snowpark to be able to run the container.

Compute Pool

A Compute Pool is a collection of one or more Virtual Machine (VM) nodes in which Snowflake runs your Snowpark Containers. In this case, this will provide the computing power to be used by the dbt job.

Let’s start by creating the Compute Pool. The creation can only be done by ACCOUNTADMIN, or alternatively by a custom role with create pool on account grants).

Here is how to do it:

use role ACCOUNTADMIN ;

create compute pool if not exists DBT_SNOWPARK_COMPUTE_POOL
	MIN_NODES = 1
	MAX_NODES = 1
	INSTANCE_FAMILY = CPU_X64_XS 
;

You can now check the details on the resource:

describe compute pool DBT_SNOWPARK_COMPUTE_POOL;

Finally, we want to ensure our custom role can use the Compute Pool we just created.

use role SECURITYADMIN ; grant USAGE, OPERATE on compute pool DBT_SNOWPARK_COMPUTE_POOL to role DBT_SNOWPARK_ROLE ;

ℹ️
Since dbt is generally lightweight and this is an just an exercise, we are sticking to the smallest instance (XS) and 1 node. Read more about different Compute Pool types and sizes in the Snowflake Documentation.

Image Repository

The next step is to create an Image Registry in Snowflake. This is where we will upload the Docker image we created to.

The Image Registry lives inside a database schema. In this case, we previously created a database schema called JAFFLE_SHOP_SNOWPARK which is supposed to keep all Snowpark dbt related artifacts. The Docker Image Registry will fit quite well there.

This is how to do it:

use role DBT_SNOWPARK_ROLE ;
create image repository DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK.IMAGE_REPOSITORY ;

Confirm it was successfully created by running this:

show image repositories in schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK ;

The query result will contain a value repository_url which should look similar to this:

xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository

Take note of this value as we will use it in next steps.

Push Docker image to Snowpark

Start by getting the Registry Name. This is the first part of the repository_url above noted. It should look like this: xxxx-xxxx.registry.snowflakecomputing.com.

The docker login command looks like this:

docker login xxxxx-xxxxx.registry.snowflakecomputing.com -u <YOUR_USER>

Run the command and type your password when prompted.

ℹ️
If you have MFA enabled on your account, you might notice that the login process might be a bit buggy! Think about bypassing the MFA on your account for a couple of minutes if you are only following this tutorial, and think about using a Service Account for this in a production scenario.

Tag the Docker image with the repository_url value, followed by the image name:

docker tag dbt-snowpark xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark

And push:

docker push xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark

You can run the following query to confirm the image has been pushed:

select SYSTEM$REGISTRY_LIST_IMAGES('/dbt_snowpark_db/jaffle_shop_snowpark/image_repository');

It should return something like this:

{"images":["dbt-snowpark"]}

Voilà! Your image is now in Snowflake 🎉

Running containerised dbt in Snowpark

Execute the Service

This is the moment of truth! Let’s trigger a containerised Snowpark dbt run. For that, we are going to run the following SQL command:

use role DBT_SNOWPARK_ROLE ;

execute job service
  in compute pool DBT_SNOWPARK_COMPUTE_POOL
  name = 'dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job'
  from specification $$
    spec:
      containers:
      - name: dbt-job
        image: xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
        command: [ "python", "/usr/dbt/entrypoint.py", "--command", "dbt build --target snowpark" ]
  $$
;

Once finished, you should be able to see your raw data and models in JAFFLE_SHOP_SNOWPARK schema! 🎉

ℹ️
The execute job service command is synchronous. Once you trigger it, the query only finishes once the container exits (successfully or not).

Under the Hood

The command execute job service will spin up the Docker container according to the instructions in the specification block, and the query finishes once the container exits (regardless of success or failure).

The service and local container will still be available for some minutes (in our experience between 30 and 60 minutes), after which Snowflake performs a cleanup by dropping the service. You can access the details by running the following query:

show service containers in service dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;

You can also access the logs produced by the container using this query (already parsed):

-- Returns the last 100 log lines
with service_logs as (
    select system$get_service_logs('dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job', 0, 'dbt-job', 100) as logs
)
    select value as log
    from service_logs , lateral split_to_table(logs, '\\n')
;

In this case, the query output is already familiar to anyone who used dbt before:

If you want to run the job a second time and the cleanup hasn’t finished, you won’t be able to do it! So what you can do is force the cleanup yourself, by running a drop service statement:

use role DBT_SNOWPARK_ROLE ;

drop service if exists dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;

After this, you will be able to trigger the job again.

ℹ️
In a real-life scenario, you might want to always run a drop service if exists statement before executing your service.

Additional Considerations

  • The Snowflake user executing the service is a special Service User that exists only while the service is running, and using the role DBT_SNOWPARK_ROLE .
  • The dbt service is being executed by the compute pool we initially created.
  • The SQL models that triggered by dbt are run by the special Service User as well, under role DBT_SNOWPARK_ROLE as well, and using the virtual warehouse DBT_SNOWPARK_WH.
    • This is all specified in the profiles.yml file that we created before.
    • Your own Snowflake User is allowed to run all dbt generated queries by authenticating using the OAuth token and Env Vars that were set up in the entrypoint.py file, and assuming the role DBT_SNOWPARK_ROLE.

ℹ️
For more information on how Job Services work, check the Snowflake Documentation on Job Service execution.

Logs, Scheduling, Error Notifications

Now that we have a way for Snowpark to execute dbt, we can address additional functionalities to make it production-ready.

Access Persistent Logs

You can use an Event Table to access all logs even after the services are dropped.

The following query will give you all logs from past jobs:

use role ACCOUNTADMIN ;

select TIMESTAMP, VALUE, RESOURCE_ATTRIBUTES
from <YOUR_EVENT_TABLE>
where RESOURCE_ATTRIBUTES:"snow.service.name" = 'DBT_SNOWPARK_JOB'
and VALUE != ''
order by timestamp desc
limit 100;

As an idea, this SQL logic could be captured in a view to simplify its usage.

The column RESOURCE_ATTRIBUTES also contains more details on each Snowpark Container run. It is specially useful to filter by snow.service.container.run.id to isolate logs from a single run:

use role ACCOUNTADMIN ;

select TIMESTAMP, VALUE, RESOURCE_ATTRIBUTES
from <YOUR_EVENT_TABLE>
where RESOURCE_ATTRIBUTES:"snow.service.container.run.id" = '3609f3'
and VALUE != ''
order by timestamp desc
limit 100;

ℹ️
New Snowflake accounts already come with Event Table activated out of the box. If your account does not have this enabled, you have to do it manually. Also notice we are using ACCOUNTADMIN to query this table. The Event Table has logs for everything happening in this Snowflake account! In a real-life scenario you might want to use RBAC to ensure different access levels (row-level) to different users. For more information on how to enable and manage Event Tables, check the Snowflake Documentation.

Schedule Runs with Error Notifications

Enabling scheduled runs and trigger error notifications, all within Snowflake, closes the loop on the work we have done so far. For this, we are going to use a combination of Snowflake Procedures, Tasks, and Notification Integration.

Let’s start by creating a Notification Integration to allow Snowflake to send emails to a set of users.

use role ACCOUNTADMIN ;

create notification integration dbt_snowpark_failure_notification_integration
    type = EMAIL
    enabled = TRUE
  allowed_recipients = ('email1@email.test','user2@email.test')
; 

You can test sending emails using the following procedure:

call SYSTEM$SEND_EMAIL(
    'dbt_snowpark_failure_notification_integration', -- Notification Integration
    'email1@email.test, email2@email.test',          -- Recipient List 
    'Test Subject',                                  -- Email Subject
    'Test Email Body'                                -- Email Content
);
💡
The email addresses used in the Notification Integration need to be verified within the Snowflake Console! In case you want to send emails to an email that does not belong to a user (e.g. a mailing list, or a Slack channel email address), consider creating a Snowflake user solely for that purpose.

Now, let’s ensure our custom role has enough permissions to send emails:

use role ACCOUNTADMIN ;
grant usage on integration dbt_snowpark_failure_notification_integration to role DBT_SNOWPARK_ROLE ;

With email notifications ready, now let’s create a Snowflake Procedure to trigger the job.

use role DBT_SNOWPARK_ROLE ;

create or replace procedure run_dbt_snowpark()
  returns string
  language sql
  execute as caller
as
declare
    error_message STRING ;
begin
    -- 1. Clean Service if needed
    drop service if exists 
        dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job ;

    -- 2. Execute Job
    execute job service
      in compute pool DBT_SNOWPARK_COMPUTE_POOL
      name = 'dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job'
      from specification $$
        spec:
          containers:
          - name: dbt-job
            image: xxxx-xxxx.registry.snowflakecomputing.com/dbt_snowpark_db/jaffle_shop_snowpark/image_repository/dbt-snowpark
            command: [ "python", "/usr/dbt/entrypoint.py", "--command", "dbt build --target snowpark" ]
      $$;
      return 'Snowpark dbt job completed successfully!' ;
exception
	-- Send email when the job fails
  when OTHER then
    error_message := 'Snowpark dbt Run failed.\\n' || 'SQL Code: ' || SQLCODE || '\\n' || 'Error Message: ' || SQLERRM || '\\n' || 'SQL State: ' || SQLSTATE || '\\nPlease check the Events Table to access the dbt logs.';
    
    call SYSTEM$SEND_EMAIL(
       'dbt_snowpark_failure_notification_integration',
        'email1@email.test, email2@email.test',
        'Snowpark dbt job Failure',
        :error_message 
     );

    return error_message;
end;

What does this Procedure do?

  1. Performs a Service cleanup (if exists)
  2. Executes the Snowpark Service Job

In case any of the above steps fail, an email is sent to the provided recipients, letting them know that the job has failed, and includes some error information:

You can test a manual run of this procedure, by running:

use role DBT_SNOWPARK_ROLE ;
call run_dbt_snowpark() ;

The only part missing now is scheduling! And for that, we are going to use a Snowflake Task.

Let’s start by granting the necessary permissions to our custom role:

use role ACCOUNTADMIN ;
grant EXECUTE TASK on account to role DBT_SNOWPARK_ROLE ;

use role SECURITYADMIN ;
grant CREATE TASK, EXECUTE TASK on schema DBT_SNOWPARK_DB.JAFFLE_SHOP_SNOWPARK to role DBT_SNOWPARK_ROLE ;

And now let’s create our task:

use role DBT_SNOWPARK_ROLE ;

create or replace task DBT_SNOWPARK_DAILY_RUN_TASK
  warehouse = DBT_SNOWPARK_WH
  schedule = 'USING CRON 0 7 * * * Europe/Amsterdam'
  AS
    call run_dbt_snowpark()
;

In this case, the task will trigger the procedure every day at 7AM using the CET Amsterdam Timezone. You can now adapt the CRON expression according to the desired frequency and also create as many Tasks as required! You can check the CRON GURU website to learn more about CRON expressions.

Bonus

Executing the Job using Snowflake CLI

The job can also be triggered ah-hoc using a terminal/shell script. This can be quite useful if scheduling the job from a 3rd party tool (think orchestrators like Airflow, Dagster, Prefect), or running jobs in a CI/CD pipeline (e.g. for testing purposes).

To do this, you need to create a yaml spec file. You can use the contents of the specification block in the execute job service query and save it in a yaml file.

For this case, a file called snowpark_dbt_spec.yaml was created in the root folder of the repository:

ℹ️
As you can see, you can use different spec files for different jobs (e.g. one for dbt runs and one for dbt tests, or different specs for different run frequencies).

Then all you need to do is run the following command:

snow spcs service execute-job \\
    dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job \\
    --compute-pool DBT_SNOWPARK_COMPUTE_POOL \\
    --spec-path snowpark_dbt_spec.yaml \\
    --role DBT_SNOWPARK_ROLE \\
    --connection <YOUR_CONNECTION>

And you can do the cleanup as well:

snow spcs service drop \\
    dbt_snowpark_db.jaffle_shop_snowpark.dbt_snowpark_job \\
    --role DBT_SNOWPARK_ROLE \\
    --connection <YOUR_CONNECTION>

Conclusion

In this tutorial, we covered how to set up a sample dbt project, run it locally, containerise it and run it in Snowpark. We finish off with scheduling a job, logging, and alerting in case of failures.

A lot more could be done to improve the setup! An idea is to take the concept of Snowflake as a single solution even further is to leverage the native Streamlit Apps to generate operational dashboards, cost monitoring, and even serving HTML content such as dbt docs and elementary report.

Since Snowpark is also still relatively new, the whole ecosystem around it is still not fully developed, which might be limiting in some ways. For example, you still cannot use Infrastructure-as-Code to create Snowpark resources like Compute Pools or Image Registries (though Terraform is currently working on it, so we can count on it soon).

We hope this is useful, and if you have nice Snowpark tips, let us know so we can include them in the article!