Databricks Data Engineer Associate Certification Overview (2024)

I’ve been looking at taking the Databricks Data Engineer Associate Certification (the Databricks page for the certification is here) as I’ve noted the certification covers areas such as the overall Databricks platform, Delta Lake and Delta Live Tables, the SQL endpoints, dashboarding, and orchestration – perfect for any work I would do in Databricks. The certification is targeting people who have up to 6 months experience working with Databricks. There is the Spark Developer certification but for people like me who are Spark-light in terms of skillset, it can seem quite daunting.

The certification covers areas such as the overall Databricks platform, Delta Lake and Delta Live Tables, the SQL endpoints, dashboarding, and orchestration

Why do this exam? Well, I like the inclusion of Delta Lake which can be used in many engines…such as Serverless SQL Pools in Synapse Analytics. So being able to focus on how to work with Delta is very useful. There’s also Delta Live Tables too, a great addition to the platform. I also like the Just Enough Python section, as I’m not much of a Python programmer but being given guidance on relevant areas to concentrate on is very useful. Overall it gives a great overview of the Databricks platform and keeps it very relevant to the Lakehouse pattern. Plus it’s not heavy on the actual Spark implementation, very useful for people like me!

I also like to see the list of skills that are expected in the certification, this might sound obvious but I like to know what a “Databricks Data Engineer” might need to know (at least at this level of experience). E.G. if you want to work as an Azure Data Engineer, what skills might you need? What platforms and features should you use? The Microsoft DP-203 exam lists out all the expected skills and platforms and really gives you a focus (and boundary) on what you need to know – great for learning.

Let’s dive into the exam details and expected skills for the Databricks Certified Data Engineer Associate. I won’t be providing links to learning for each and every item, but rather some overall resources to look at.

Skills Measured

  • Databricks Lakehouse Platform (24%)
  • ELT with Spark SQL and Python (29%)
  • Incremental Data Processing (22%)
  • Production Pipelines (16%)
  • Data Governance (9%)

Exam Details

  • Time: 90 minutes
  • Passing score: 70%
  • Fee: $200
  • Retake policy: Any time but must pay fee again
  • Questions: 45

Learning Resources

Exam Versions

As of November/December 2022 there are currently 2 versions of the exam, V2 and V3. I have listed the skills required for the V2 version of the exam as V3 has only just been released. On the Databricks Academy site, there are V2 and V3 courses to align with the exam version. Of note is that Structured Streaming and Auto Loader content has been dropped from V3. Thanks to Kevin Chant for clarifying the exam versions.

You will need to register with the exam platform to book the exam here.

The choice is yours in terms of which version to study for, the V2 exam is available until April 2023.

Databricks Data Engineer Associate Certification Overview (1)

Databricks Academy

There are courses available on the Databricks Academy website. You’ll need to register with the academy here.

Let’s now breakdown all the skills measures by category. This list was compiled directly from the Certification Overview Video on the Databricks Academy website. I find it very useful to list out all the skills measured from the overall category to the detailed items.

Databricks Lakehouse Platform (24%)

Lakehouse

Describe the components of the Databricks Lakehouse.

  • Lakehouse concepts
    • Lakehouse vs Data Warehouse
    • Lakehouse vs Data Lake
    • Data quality improvements
  • Platform architecture
    • High-level architecture & key components of a workspace deployment
    • Core services in a Databricks workspace deployment
  • Benefits of the Lakehouse to Data Teams
    • Organisational data problems that Lakehouse solves
    • Benefits to different roles in a Data team

Data Science & Engineering Workspace

Complete basic code development tasks using services of the Databricks Data Engineering & Data Science Workspace.

  • Clusters
    • All-purpose clusters vs Job clusters
    • Cluster instances & pools
  • DBFS
    • Managing permissions on tables
    • Role permissions and functions
    • Data Explorer
  • Notebooks
    • Features & limitations
    • Collaboration best practices
  • Repos
    • Suppored features & Git operations
    • Relevance in CI/CD workflows in Databricks

Delta Lake

  • General concepts
    • ACID transactions on a data lake
    • Features & Benefits of Delta lake
  • Table Management & Manipulation
    • Creating tables
    • Managing files
    • Writing to tables
    • Dropping tables
  • Optimisations
    • Supported features & benefits
    • Table utilities to manage files

ELT with Spark SQL & Python (29%)

  • Build ELT pipelines using Spark SQL & Python
    • Relational entities (databases, tables, and views)
    • ELT (Creating tables, writing data to tables, transforming data, & UDFs)
    • Manipulating data with Spark SQL & Python

Relational Entities

Leverage Spark SQL DDL to create and manipulate relational entities on Databricks

  • Databases
    • Create databases in specific locations
    • Retrieve locations of existing databases
    • Modify and delete databases
  • Tables
    • Managed vs External Tables
    • Create and drop managed and external tables
    • Query and modify managed and external tables
  • Views and CTEs
    • Views vs Temporary Tables
    • Views vs Delta Lake tables
    • Creating Views and CTEs

ELT Part 1: Extract & Load Data into Delta Lake

Use Spark SQL to extract, load, and transform data to support production workloads and analytics in the Lakehouse.

  • Creating Tables
    • External sources vs Delta Lake tables
    • Methods to create tables and use cases
    • Delta table configurations
    • Different file formats and data sources
    • Create Table As Select statements
  • Writing Data to Tables
    • Methods to write to tables and use cases
    • Efficiency for different operations
    • Resulting behaviours in target tables

ELT Part 2: Use Spark SQL to Transform Data

Use Spark SQL to extract, load, and transform data to support production workloads and analytics in the Lakehouse

  • Cleaning Data with Common SQL
    • Methods to deduplicate data
    • Common cleaning methods for different types of data
  • Combining Data
    • Join types and strategies
    • Set operators and applications
  • Reshaping Data
    • Different operations to transform arrays
    • Benefits of array functions
    • Applying higher order functions
  • Advanced Operations
    • Manipulating nested data fields
    • Applying SQL UDFs for custom transformations

Just Enough Python

Leverage Pyspark for advanced code functionality needed in production applications.

  • Spark SQL Queries
    • Executing Spark SQL queries in Pyspark
  • Passing Data to SQL
    • Temporary Views
    • Converting tables to and from DataFrames
  • Python syntax
    • Functions and variables
    • Control flow
    • Error handling

Incremental Data Processing (22%)

Incrementally process data includes:

  • Structured Streaming (general concepts, triggers, watermarks)
  • Auto Loader (streaming reads)
  • Multi-hop Architecture (bronze-silver-gold, streaming applications)
  • Delta Live Tables (benefits and features)

Structured Streaming

  • General Concepts
    • Programming model
    • Configuration for reads and writes
    • End-to-end fault tolerance
    • Interacting with streaming queries
  • Triggers
    • Set up streaming writes with different trigger behaviours
  • Watermarks
    • Unsupported operations on streaming data
    • Scenarios in which watermarking data would ne necessary
    • Managing state with watermarking

Auto Loader

  • Define streaming reads with Auto Loader and Pyspark to load data into Delta
  • Define streaming reads on tables for SQL manipulation
  • Identifying source locations
  • Use cases for using Auto Loader

Multi-hop Architecture

  • Bronze
    • Bronze vs Raw tables
    • Workloads using bronze tables as source
  • Silver and Gold
    • Silver vs gold tables
    • Workloads using silver tables as source
  • Structured Streaming in Multi-hop
    • Converting data from bronze to silver levels with validation
    • Converting data from silver to gold levels with aggregation

Delta Live Tables

  • General concepts
    • Benefits of using Delta Live Tables for ETL
    • Scenarios that benefit from Delta Live Tables
  • UI
    • Deploying DLT pipelines from notebooks
    • Executing updates
    • Explore and evaluate results from DLT pipelines
  • SQL syntax
    • Converting SQL definitions to Auto Loader syntax
    • Common differences in DLT SQL syntax

Production Pipelines (16%)

Build production pipelines for data engineering applications and Databricks SQL queries and dashboards, including:

  • Workflows (Job scheduling, task, orchestration, UI)
  • Dashboards (endpoints, scheduling, alerting, refreshing)

Workflows

Orchestrate tasks with Databricks Workflows

  • Automation
    • Setting up retry policies
    • Using cluster pools and why
  • Task Orchestration
    • Benefits of using multiple tasks in jobs
    • Configuring predecessor tasks
  • UI
    • Using notebook parameters in Jobs
    • Locating job failures using Job UI

Dashboards

Using Databricks SQL for on-demand queries.

  • Databricks SQL Endpoints
    • Creating SQL endpoints for different use cases
  • Query Scheduling
    • Scheduling query based on scenario
    • Query reruns based on interval time
  • Alerting
    • Configure notifications for different conditions
    • Configure and manage alerts for failure
  • Refreshing
    • Scheduling dashboard refreshes
    • Query reruns impact of dashboard performance

Databricks Governance (9%)

Unity Catalog

  • Benefits of Unity Catalog
  • Unity Catalog Features

Entity Permissions

  • Team-based & User-based Permissions
  • Configuring access to production tables and databases
  • Granting different levels of permissions to users and groups

References

Tags: Databricks, Delta, Delta Live Tables

Databricks Data Engineer Associate Certification Overview (2024)
Top Articles
Fishing catches
Princess Fish
Walgreens Alma School And Dynamite
Kent And Pelczar Obituaries
How Far Is Chattanooga From Here
What's Wrong with the Chevrolet Tahoe?
Corpse Bride Soap2Day
Big Y Digital Coupon App
Autozone Locations Near Me
Zendaya Boob Job
Dusk
Https //Advanceautoparts.4Myrebate.com
Oxford House Peoria Il
Ivegore Machete Mutolation
Nebraska Furniture Tables
Missing 2023 Showtimes Near Landmark Cinemas Peoria
Most McDonald's by Country 2024
Suffix With Pent Crossword Clue
Why Should We Hire You? - Professional Answers for 2024
Nearest Walgreens Or Cvs Near Me
Graphic Look Inside Jeffrey Dahmer
Clare Briggs Guzman
Chaos Space Marines Codex 9Th Edition Pdf
Www.dunkinbaskinrunsonyou.con
Shreveport City Warrants Lookup
Walmart Pharmacy Near Me Open
Tire Plus Hunters Creek
Bleacher Report Philadelphia Flyers
Bidrl.com Visalia
Effingham Daily News Police Report
Penn State Service Management
Weather Underground Durham
Superhot Free Online Game Unblocked
3 Ways to Format a Computer - wikiHow
Babydepot Registry
King Soopers Cashiers Check
Lowell Car Accident Lawyer Kiley Law Group
Here’s how you can get a foot detox at home!
T&J Agnes Theaters
Bbc Gahuzamiryango Live
Kelly Ripa Necklace 2022
159R Bus Schedule Pdf
Casamba Mobile Login
Energy Management and Control System Expert (f/m/d) for Battery Storage Systems | StudySmarter - Talents
Executive Lounge - Alle Informationen zu der Lounge | reisetopia Basics
9:00 A.m. Cdt
Joy Taylor Nip Slip
Wood River, IL Homes for Sale & Real Estate
Julies Freebies Instant Win
Game Akin To Bingo Nyt
Overstock Comenity Login
Ravenna Greataxe
Latest Posts
Article information

Author: The Hon. Margery Christiansen

Last Updated:

Views: 6027

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.