2024 Mastering dbt (Data Build Tool) - From Beginner to Pro
Ditulis pada: January 01, 2024
2024 Mastering dbt (Data Build Tool) - From Beginner to Pro - Hands-on Analytics Engineering Bootcamp Including Theory, Building a dbt Project from Scratch, & Deploying to dbt Cloud
- How to build a complete dbt project from scratch
- The main benefits of dbt, and a bit of background as to how it came about
- All of the dbt fundamentals: sources, models, tests, documentation, snapshots, seeds, macros, hooks, and operations
- How to structure a dbt project: staging, intermediate, and mart models - and naming conventions
- How to version control changes to your code with GitHub and VSCode
- Advanced dbt testing - creating your own custom singular & generic tests, setting severity, and setting warn/error thresholds
- Advanced dbt data modelling - model materialisation and governance (access, contracts, and versions)
- Advanced dbt commands - how to use different selectors, different profiles, tags, indirect test selection and building a local dbt documents site
- Advanced dbt jinja & macros - creating your own macros to use in hooks / functions / operations, using jinja for loops and variables, and the target function
- How to deploy your project on dbt Cloud, how to use the dbt Cloud UI, and using environment variables
- How to use tests & macros from external packages to supercharge your dbt project
- Best practises to use when running a dbt project (based on lots of experience!)
- How to create a complete setup for Mac or Windows: installing all of the tools and getting a dbt specific VSCode setup!
Description
A complete course to help anyone with basic SQL skills learn advanced dbt, a key tool for Analytics Engineering!
Welcome to the 2023 Mastering dbt (data build tool) course! This course runs through everything from the theory behind dbt to building an advanced dbt project (from scratch) and deploying it on dbt Cloud.
I have over 8 years of experience across Analytics / Analytics Engineering / Data Science, including 4 years using dbt on a daily basis. I was also involved in the rollout of dbt in my time at Monzo Bank!
In this course I've taken everything I've learnt over the past 4 years, and what I use on a daily basis, and condensed it to take anyone who knows SQL to an advanced level of dbt as quickly as possible.
COURSE UPDATES:
April 2023: More content added for setup using PowerShell (Windows)
May 2023: New content - dbt version 1.5 (released April 2023)
June 2023: Added overview & recap lectures to all sections to reinforce what we've learned
August 2023: Updated to dbt version 1.6 - added in lectures on model governance (access, contracts, and versions)
MY APPROACH TO THIS COURSE:
We'll cover everything you need to know about dbt: from the basic data modelling right through to all of the advanced features such as creating custom tests and macros. We'll be doing this step by step, and build from the basics upwards.
It's focused on practical outcomes - we won't be spending ages on database theory, or going into lots of detail on the eCommerce dataset we'll be using, instead we'll be aiming to get you up to advanced dbt levels as quickly as possible.
For every video where we're writing code, I've created lesson attachments with the final outputs. This means you can either code as you go along, or watch the videos and look at the handouts afterwards! I've also included some theory with these handouts to help hammer home the points made in the videos.
There's also a public GitHub repository (which you'll be using for this course) that contains a model final project you can reference throughout.
This course isn't static! I'd love to hear your feedback and will be updating this course on an ongoing basis.
COURSE STRUCTURE:
This course focuses on first getting a good understanding of what problems dbt solves, then building a basic dbt project, before layering on more advanced concepts and finally deploying our project with dbt Cloud.
Introduction
Some theory (<1 hour) around dbt, what problems existed in the data stack before it came along, and how it solves them.
Tool setup
Getting set up with Python, GitHub, Google BigQuery, VSCode, and of course dbt! If you're familiar with any of these tools already then you are more than welcome to skip the appropriate lessons.
We'll also be exploring the fictional eCommerce dataset that we'll be using throughout the course.
Building our basic dbt project
This section focuses on creating our project from scratch, including how we will structure our project.
We'll be building out staging (stg), intermediate (int), and mart data models, including documentation & testing with the out-of-the-box dbt tests.
Advanced dbt testing
We'll start to build on our basic dbt project by setting test severity & thresholds, using the dbt-utils and dbt-expectations external packages for their excellent selection of tests, creating our own custom singular & generic tests, and testing the freshness of our source data.
Advanced data modelling with dbt
Next, we'll be looking at how we can create reusable documentation, seed files (version controlled .csv files), snapshots (capturing changes to data tables), and materialisation methods.
Most of this section will be focused on the last part - the materialisation methods: ephemeral, view, table, and incremental. By this point we'll have encountered view & table models and we will be building both an incremental and an ephemeral model - and you will gain an understanding of what to use and when.
This section includes all model governance features from dbt version 1.5! This includes model access, groups, contracts, and versions.
Advanced dbt commands
This section will focus less on changing our dbt project, but instead all of the major dbt commands and how (and when) to use them.
Advanced Jinja & macros
The final changes to our project will involve using Jinja - a core feature of dbt and arguably it's most complex but powerful feature - and using it to create our own macros.
This section will run through how you can use Jinja macros for hooks, operations, and as reusable functions in your SQL models. It'll also run through some theory around Jinja, common mistakes, and what I (personally) find to be what it's most useful for!
dbt Cloud
Finally, we'll be exploring how to take our project and deploy it on dbt Cloud - including how to schedule it to run on a regular basis. We'll also be looking at dbt Cloud itself and its main benefits.
Who this course is for:
- Data Analysts
- Data Scientists
- Analytics Engineers
- Data Engineers
- BI Professionals
- Anyone interested in getting into data!