Engine Works

BenoitC

Problem Statement

Today, many organisations face the same recurring challenge:

Data is engineered in one place, analysed in another, and the connection between the two is often manual, fragile, or inefficient.

Data engineers work in Databricks, designing scalable pipelines and Lakehouse architectures.
Business analysts work in Alteryx or BI tools, needing clean, trusted, up-to-date data to make decisions.
IT teams struggle to provide governed, real-time access without duplicating data or adding operational overhead.
And small teams or newcomers often believe that building a modern data pipeline requires heavy infrastructure or enterprise licences.

As a result, companies often end up with:

Inconsistent datasets across teams,
Delays between engineering and analytics,
Repeated data extraction or replication,
Difficulty operationalising insights,
Frustration on both sides of the “tech vs business” gap.

This article addresses exactly that problem.

By using only Databricks Free Edition and Alteryx One, we demonstrate that anyone can:

Build a structured Bronze / Silver / Gold pipeline using Delta Lake
Expose an analytics-ready table through a Databricks Serverless SQL Warehouse
Connect Alteryx Cloud via Live Query without moving or duplicating data
Enrich the dataset with business logic in a no-code Alteryx workflow
Publish a clean dataset within minutes

The value?

A fully modern, end-to-end, reproducible analytics pipeline, accessible to both data engineers and business users, without needing a full cloud environment or complex infrastructure.

If your goal is to understand how to connect the Lakehouse world (Databricks) with the no-code analytics world (Alteryx), this article shows the how and the why through a practical example you can reproduce today.

1. Introduction: Why Databricks and Alteryx?

In this article, I’ll walk through a simple yet powerful end-to-end workflow demonstrating how to combine Databricks for scalable data engineering with Alteryx One for intuitive, no-code analytics.

Even with only:

Databricks Free Edition,
a single CSV file,
a small Excel reference table,

…it’s possible to build a pipeline inspired by modern Medallion Architecture, expose clean Delta tables, and make them instantly consumable through Live Query in Alteryx One.

The goal is not to replicate a full enterprise setup, but to show how both platforms complement each other and accelerate analytics for technical and business users alike.

2. End-to-End Architecture Overview

Here is the architecture we will build:

Ingest a CSV file into Databricks
Apply Bronze → Silver → Gold transformations
Store the refined table as Delta Lake table
Connect Alteryx Cloud Live Query to the Gold table
Enrich the dataset with an Excel file (business targets and reference data)
Perform no-code transformations in Alteryx
Publish a Power BI dashboard for final insights

The key message is:
Databricks handles scalable data preparation, Alteryx unlocks business-ready analytics.

3. Databricks Pipeline: Simple, Reproducible, and Modern

Even with the Free Edition, Databricks provides everything needed to structure a clear data engineering workflow using Delta Lake and notebooks.

Onboarding for the Free Edition is very easy. You can sign up in just a few clicks by searching “Databricks Free Edition” and opening the official link.

You can sign up for the Free Edition here:

Once you complete the initial steps, you now have access to Databricks, congratulations!

3.1 Bronze – Raw Ingestion

We start by uploading a CSV file into DBFS (or directly from a cloud bucket if preferred).

By clicking “Upload Data,” you can directly add flat files into Databricks. For the purpose of this article, we keep things simple by adding raw data directly into DBFS.

The process is straightforward.

Databricks automatically converts the uploaded file into a Delta table, allowing us to preserve raw data in a single, unified environment.

Opening a notebook, we can now see that our table is available:

Databricks also provides Serverless clusters, meaning you don’t need to configure or manage any compute to start working with your data. It just works, Databricks handles all the compute in the background.

Our files are now fully available in Databricks.

To simulate a production environment, we now copy our data from Raw to Bronze. Databricks structures data into three layers: Bronze (raw), Silver (cleaned and standardized), and Gold (analytics-ready).

Tables are now created and ready for cleansing in the Silver layer.

We are now ready to move to the next stage.

3.2 Silver – Cleaning & Standardization

The Silver layer produces a clean, consistent dataset that enables value creation in the downstream Gold layer. Uncleaned data often contains inconsistent types, missing values, duplicate records, and other quality issues.

To do this, we stay in the same notebook and switch to Python, demonstrating Databricks’ flexibility by allowing users to choose the language they prefer. We start using PySpark SQL so we can easily manipulate the data directly in our notebook.

In a single step, we can now see clean data in our Silver layer after applying correct data types, recalculating amounts, and adding quality filters.

This step can be directly automated from the Notebook interface, allowing us to eliminate manual effort and reduce operational toil.

We are now ready in Databricks to build our Gold layer.

3.3 Gold – Analytics-Ready Table

We simply run a LEFT JOIN between our two Silver tables to produce the Gold table, which is now ready for downstream analytics.

We can now run our Alteryx workflow on this data.

We treat sales_silver as our transactional fact table (each row represents a transaction) and customers_silver as our cleaned customer dimension.

In the Gold layer, we bring both together into a single fact_sales_gold table, which is the one we expose to Alteryx via Live Query.

4. Connecting Alteryx One to Databricks with Live Query

Alteryx One now allows us to use all Alteryx products in a single, seamless experience, whether on the cloud or on a laptop. We first connect to our Databricks data using Alteryx One. To do this, we go to the Alteryx One homepage, click on our profile, and navigate to Workspace Admin.

Here we can see the Databricks menu, where we can provision our Databricks workspace:

This information can be found easily in Databricks.

The service URL is the portion of your Databricks workspace URL up to cloud.databricks.com.

We now return to Databricks to generate the Personal Access Token (PAT). Be mindful of the security implications: these keys should never be shared. Go to your profile, open Settings → Developer, and generate a new token as shown below. Paste this token into Alteryx.

Everything is now set. You just need to fill in the remaining information:

As a final step, in the Data tab, we simply need to add the connection — and we are all set:

Just add a connection name, all information has already been filled in:

Once connected, Alteryx queries the Delta table live, without moving or duplicating data, a perfect fit for Lakehouse patterns..

This allows data engineers to refine the pipeline in Databricks while analysts explore the same data instantly in Alteryx.

5. No-Code Business Enrichment in Alteryx

We can now select our Gold table and begin working with our data:

When loading our data in Alteryx, nothing is actually imported into the backend. Everything stays in Databricks, keeping costs low and minimizing data movement:

This is where Alteryx shines: turning a curated dataset into actionable business insights — without writing code.

We now create an Alteryx workflow in Designer Cloud. From the homepage, click Create New → Designer Cloud to begin:

We can now add an Input tool and start working with our Gold table from Databricks:

By default, LiveQuery is enabled, allowing us to use the entire dataset directly in our browser without any replication in the Alteryx infrastructure. This is a major advantage. It enables full pushdown processing and allows users to leverage no-code tools without copying data, relying instead on the powerful scaling capabilities of Databricks.

You can verify whether Live Query is enabled by clicking your profile icon (top right), navigating to Workspace Admin → Settings, and checking the Enable Live Query option.

We can now import our Excel file into Alteryx One:

We can now add the remaining tools and easily prep and blend the data, without writing a single line of code:

And the beauty of this approach is that all processing happens in Databricks.

7. Combined Benefits of Databricks + Alteryx

🟧 What Databricks brings

scalable Spark compute
strong data engineering foundations
Delta Lake performance & reliability
structured Medallion architecture (Bronze / Silver / Gold)

🟦 What Alteryx brings

no-code transformation for business users
governed access to Databricks tables
fast iteration for analytics and enrichment
seamless export to BI tools

🟪 Together

Together, they deliver a modern, efficient workflow that bridges engineering and business teams — without unnecessary complexity.

8. Conclusion

This project demonstrates that, even with minimal resources, Databricks Free Edition and an Alteryx One environment, it is entirely possible to build a modern Lakehouse-style pipeline and deliver business-ready insights.

Databricks provides the engine, Alteryx provides the experience, and together they accelerate analytics from raw data to actionable value.

BS_THE_ANALYST

I love the concept of bringing Alteryx and Databricks together. I'm not sold on the Designer Cloud example workflow. I'd love to have seen a better usecase. You could argue that an analyst could get very close (if not exactly the same) results by using the Databricks Assistant for building the code (it's basically no code if they're vibe coding 😂). How can we really showcase the power of Alteryx here?

I think there could be some real value with the role of Analytics Engineering in this case. The Data Engineer, in Databricks, creates the Bronze/Silver layer, and we take it from there? We could also tap into Alteryx's Auto Insights by bringing it into the Alteryx One world.

Given that you're creating an output for BI, I'd love to see this taken a step further, especially as you've named Power BI as your BI of choice. Can you create and connect to a semantic model that Alteryx has helped create?

This could be a really cool series @BenoitC . Looking forward to following along.

I know you've mentioned about this not being "enterprise" but I'd urge you to put a warning in the article. Databricks Free Edition doesn't provide a safe location for sensitive information. If a company is reading the article and building out a POC similar to the one you've stated, they might expose their data unintentionally.

Engine Works

Building a Modern Data Pipeline with Databricks and Alteryx

Problem Statement

The value?