This programme is offered as two distinct, parallel tracks — one focused on Databricks (AI-augmented data engineering across the lakehouse) and the other on Snowflake (in-platform AI through Cortex and
Snowpark). Each cohort selects a single track. The training is designed for lateral hires and experienced
professionals working with modern data architectures, focused on advanced data engineering, cloud data warehousing, and the integration of AI and ML workloads.
The Databricks track helps experienced data engineers integrate AI into day-to-day pipeline development. It begins with structured thinking through specifications, metadata grounding, and instruction frameworks, and progresses through AI-assisted SQL and PySpark generation, quality and DataOps practices, chained workflows, and agentic patterns within an enterprise governance frame (MCP, audit, permissions).
The Snowflake track applies AI capabilities natively within Snowflake — covering Cortex and Snowpark,
metadata-driven AI thinking, AI-assisted query and pipeline generation, Cortex enrichment functions
(SUMMARIZE, CLASSIFY, SENTIMENT), event-driven automation via Streams and Tasks, and Cortex Search / RAG patterns as governed Native App-style data products.
Both tracks emphasise governance, security, and cost considerations, and consolidate learning through an end-to-end capstone.
WHO SHOULD ATTEND
- Data engineers and platform teams ready to go deeper into AI-augmented engineering
- Cloud engineers working on enterprise analytics stacks built on Databricks or Snowflake
- Professionals targeting senior Data Engineering or Architect roles
- Lateral hires upskilling into modern data architectures and AI/ML workload integration
- Engineers introducing AI tooling into existing CI/CD, DataOps and enterprise governance practices
PRE – REQUISITIES
- Strong SQL fundamentals — joins, aggregations, and basic query optimisation
- Working knowledge of Python or PySpark for data processing and pipeline development
- Working knowledge of data warehousing concepts and cloud-based data platforms
- Understanding of ETL/ELT workflows, including batch and incremental processing
- Familiarity with Git: commits, branches, repositories
- Awareness of data quality and validation concepts (null handling, schema checks, basic testing)
- Basic awareness of AI/ML concepts such as embeddings, classification, or text processing
KEY OUTCOMES
- Translate business use cases into structured engineering specifications, applying metadata, schema context and instruction frameworks
- Evaluate where AI capabilities fit within data workflows — Cortex and Snowpark on Snowflake, or AI-assisted PySpark and SQL on Databricks
- Build and optimise AI-assisted data pipelines covering transformation, debugging, testing and data quality validation
- Implement enrichment, automation and incremental processing using Streams & Tasks with Cortex
functions or AI-augmented orchestration and CI/CD - Design and validate retrieval-based and search-driven use cases — Cortex Search / RAG or RAG for metadata access and chained workflows
- Operate AI tools within enterprise governance frameworks — secure access, permission control,
auditability and responsible-use - Apply cost, performance and efficiency considerations to AI-assisted data pipelines
- Establish human review checkpoints for production-grade reliability across architecture, code quality, testing and release readiness