ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests Early 2026. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
BenoitC
Alteryx
Alteryx

Problem Statement

 

Today, many organisations face the same recurring challenge:

 

Data is engineered in one place, analysed in another, and the connection between the two is often manual, fragile, or inefficient.

 

  • Data engineers work in Databricks, designing scalable pipelines and Lakehouse architectures.
  • Business analysts work in Alteryx or BI tools, needing clean, trusted, up-to-date data to make decisions.
  • IT teams struggle to provide governed, real-time access without duplicating data or adding operational overhead.
  • And small teams or newcomers often believe that building a modern data pipeline requires heavy infrastructure or enterprise licences.

As a result, companies often end up with:

 

  • Inconsistent datasets across teams,
  • Delays between engineering and analytics,
  • Repeated data extraction or replication,
  • Difficulty operationalising insights,
  • Frustration on both sides of the “tech vs business” gap.

This article addresses exactly that problem.

 

By using only Databricks Free Edition and Alteryx One, we demonstrate that anyone can:

 

  • Build a structured Bronze / Silver / Gold pipeline using Delta Lake
  • Expose an analytics-ready table through a Databricks Serverless SQL Warehouse
  • Connect Alteryx Cloud via Live Query without moving or duplicating data
  • Enrich the dataset with business logic in a no-code Alteryx workflow
  • Publish a clean dataset within minutes

The value?

 

A fully modern, end-to-end, reproducible analytics pipeline, accessible to both data engineers and business users, without needing a full cloud environment or complex infrastructure.

If your goal is to understand how to connect the Lakehouse world (Databricks) with the no-code analytics world (Alteryx), this article shows the how and the why through a practical example you can reproduce today.

1. Introduction: Why Databricks and Alteryx?

 

In this article, I’ll walk through a simple yet powerful end-to-end workflow demonstrating how to combine Databricks for scalable data engineering with Alteryx One for intuitive, no-code analytics.

Even with only:

  • Databricks Free Edition,
  • a single CSV file,
  • a small Excel reference table,

…it’s possible to build a pipeline inspired by modern Medallion Architecture, expose clean Delta tables, and make them instantly consumable through Live Query in Alteryx One.

The goal is not to replicate a full enterprise setup, but to show how both platforms complement each other and accelerate analytics for technical and business users alike.

2. End-to-End Architecture Overview


Here is the architecture we will build:

 

  • Ingest a CSV file into Databricks
  • Apply Bronze → Silver → Gold transformations
  • Store the refined table as Delta Lake table
  • Connect Alteryx Cloud Live Query to the Gold table
  • Enrich the dataset with an Excel file (business targets and reference data)
  • Perform no-code transformations in Alteryx
  • Publish a Power BI dashboard for final insights

The key message is:
Databricks handles scalable data preparation, Alteryx unlocks business-ready analytics.

3. Databricks Pipeline: Simple, Reproducible, and Modern

Even with the Free Edition, Databricks provides everything needed to structure a clear data engineering workflow using Delta Lake and notebooks.

Onboarding for the Free Edition is very easy. You can sign up in just a few clicks by searching “Databricks Free Edition” and opening the official link.

BenoitC_0-1768236498950.png

 

 

 

You can sign up for the Free Edition here:

BenoitC_1-1768236498960.png

 

 

 

Once you complete the initial steps, you now have access to Databricks, congratulations!

BenoitC_2-1768236498968.png

 

 

3.1 Bronze – Raw Ingestion

 

We start by uploading a CSV file into DBFS (or directly from a cloud bucket if preferred).

 

By clicking “Upload Data,” you can directly add flat files into Databricks. For the purpose of this article, we keep things simple by adding raw data directly into DBFS.

 

BenoitC_3-1768236498976.png

 

 

The process is straightforward.

BenoitC_4-1768236498984.png

 

 

Databricks automatically converts the uploaded file into a Delta table, allowing us to preserve raw data in a single, unified environment.

 

Opening a notebook, we can now see that our table is available:

BenoitC_5-1768236498988.png

 

 

Databricks also provides Serverless clusters, meaning you don’t need to configure or manage any compute to start working with your data. It just works, Databricks handles all the compute in the background.

 

Our files are now fully available in Databricks.

BenoitC_6-1768236498995.png

 

 

To simulate a production environment, we now copy our data from Raw to Bronze. Databricks structures data into three layers: Bronze (raw), Silver (cleaned and standardized), and Gold (analytics-ready).

BenoitC_7-1768236499006.png

 

 

Tables are now created and ready for cleansing in the Silver layer.

BenoitC_8-1768236499013.png

 

 

We are now ready to move to the next stage.

 

3.2 Silver – Cleaning & Standardization

 

The Silver layer produces a clean, consistent dataset that enables value creation in the downstream Gold layer. Uncleaned data often contains inconsistent types, missing values, duplicate records, and other quality issues.

 

To do this, we stay in the same notebook and switch to Python, demonstrating Databricks’ flexibility by allowing users to choose the language they prefer. We start using PySpark SQL so we can easily manipulate the data directly in our notebook.

 

 

BenoitC_9-1768236499020.png

 

 

In a single step, we can now see clean data in our Silver layer after applying correct data types, recalculating amounts, and adding quality filters.

BenoitC_10-1768236499030.png

 

 

This step can be directly automated from the Notebook interface, allowing us to eliminate manual effort and reduce operational toil.

 

BenoitC_11-1768236499036.png

 

 

We are now ready in Databricks to build our Gold layer.

BenoitC_12-1768236499045.png

 

 

3.3 Gold – Analytics-Ready Table

We simply run a LEFT JOIN between our two Silver tables to produce the Gold table, which is now ready for downstream analytics.

 

BenoitC_13-1768236499049.png

 

 

We can now run our Alteryx workflow on this data.

BenoitC_14-1768236499062.png

 

 

We treat sales_silver as our transactional fact table (each row represents a transaction) and customers_silver as our cleaned customer dimension.

In the Gold layer, we bring both together into a single fact_sales_gold table, which is the one we expose to Alteryx via Live Query.

4. Connecting Alteryx One to Databricks with Live Query


Alteryx One now allows us to use all Alteryx products in a single, seamless experience, whether on the cloud or on a laptop. We first connect to our Databricks data using Alteryx One. To do this, we go to the Alteryx One homepage, click on our profile, and navigate to Workspace Admin.

BenoitC_15-1768236499071.png

 

 

 

Here we can see the Databricks menu, where we can provision our Databricks workspace:

BenoitC_16-1768236499075.png

 

 

This information can be found easily in Databricks.

BenoitC_17-1768236499078.png

 

 

The service URL is the portion of your Databricks workspace URL up to cloud.databricks.com.

 

BenoitC_18-1768236499083.png

 

 

We now return to Databricks to generate the Personal Access Token (PAT). Be mindful of the security implications: these keys should never be shared. Go to your profile, open Settings → Developer, and generate a new token as shown below. Paste this token into Alteryx.

BenoitC_19-1768236499088.png

 

 

Everything is now set. You just need to fill in the remaining information:

BenoitC_20-1768236499093.png

 

 

As a final step, in the Data tab, we simply need to add the connection — and we are all set:

BenoitC_21-1768236499095.png

 

 

Just add a connection name, all information has already been filled in:

BenoitC_22-1768236499100.png

 

 

Once connected, Alteryx queries the Delta table live, without moving or duplicating data, a perfect fit for Lakehouse patterns..

This allows data engineers to refine the pipeline in Databricks while analysts explore the same data instantly in Alteryx.

5. No-Code Business Enrichment in Alteryx


We can now select our Gold table and begin working with our data:

BenoitC_23-1768236499103.png

 

 

When loading our data in Alteryx, nothing is actually imported into the backend. Everything stays in Databricks, keeping costs low and minimizing data movement:

BenoitC_24-1768236499106.png

 

 

This is where Alteryx shines: turning a curated dataset into actionable business insights — without writing code.

We now create an Alteryx workflow in Designer Cloud. From the homepage, click Create New → Designer Cloud to begin:

BenoitC_25-1768236499110.png

 

 

We can now add an Input tool and start working with our Gold table from Databricks:

BenoitC_26-1768236499114.png

 

 

 

By default, LiveQuery is enabled, allowing us to use the entire dataset directly in our browser without any replication in the Alteryx infrastructure. This is a major advantage. It enables full pushdown processing and allows users to leverage no-code tools without copying data, relying instead on the powerful scaling capabilities of Databricks.

You can verify whether Live Query is enabled by clicking your profile icon (top right), navigating to Workspace Admin → Settings, and checking the Enable Live Query option.

BenoitC_27-1768236499120.png

 

 

We can now import our Excel file into Alteryx One:

BenoitC_28-1768236499125.png

 

 

We can now add the remaining tools and easily prep and blend the data, without writing a single line of code:

BenoitC_29-1768236499131.png

 

 

And the beauty of this approach is that all processing happens in Databricks.

BenoitC_30-1768236499136.png

 


7. Combined Benefits of Databricks + Alteryx

🟧 What Databricks brings

  • scalable Spark compute
  • strong data engineering foundations
  • Delta Lake performance & reliability
  • structured Medallion architecture (Bronze / Silver / Gold)

🟦 What Alteryx brings

  • no-code transformation for business users
  • governed access to Databricks tables
  • fast iteration for analytics and enrichment
  • seamless export to BI tools

🟪 Together

Together, they deliver a modern, efficient workflow that bridges engineering and business teams — without unnecessary complexity.

8. Conclusion

This project demonstrates that, even with minimal resources, Databricks Free Edition and an Alteryx One environment, it is entirely possible to build a modern Lakehouse-style pipeline and deliver business-ready insights.

Databricks provides the engine, Alteryx provides the experience, and together they accelerate analytics from raw data to actionable value.

 

 

Comments
BS_THE_ANALYST
15 - Aurora
15 - Aurora

I love the concept of bringing Alteryx and Databricks together. I'm not sold on the Designer Cloud example workflow. I'd love to have seen a better usecase. You could argue that an analyst could get very close (if not exactly the same) results by using the Databricks Assistant for building the code (it's basically no code if they're vibe coding 😂). How can we really showcase the power of Alteryx here?

 

I think there could be some real value with the role of Analytics Engineering in this case. The Data Engineer, in Databricks, creates the Bronze/Silver layer, and we take it from there? We could also tap into Alteryx's Auto Insights by bringing it into the Alteryx One world.

 

Given that you're creating an output for BI, I'd love to see this taken a step further, especially as you've named Power BI as your BI of choice. Can you create and connect to a semantic model that Alteryx has helped create?

 

This could be a really cool series @BenoitC . Looking forward to following along. 

 

I know you've mentioned about this not being "enterprise" but I'd urge you to put a warning in the article. Databricks Free Edition doesn't provide a safe location for sensitive information. If a company is reading the article and building out a POC similar to the one you've stated, they might expose their data unintentionally.

BS_THE_ANALYST_0-1768403333997.png