Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.

Job fails when running env is Dataflow but succeeds when running env is BigQuery

Auggy
5 - Atom

Hi Community, 

 

We have created a flow and recipe in Dataprep.

We run our flows via GCP Composer and also directly in Dataprep.

Our flow Output Running environment parameter is set to Dataflow + BigQuery.

 

 

When running the flow directly from Dataprep (UI), the selected running env is BigQuery and the job succeeds.

 

Nevertheless, when running the same flow via Composer (Orchestrator) calling Dataprep via DAG, the running environment is DataFlow.

When Dataflow, the job fails with a schema error

2023-12-07 18:19:29.761 EST
Error message from worker: java.lang.IllegalStateException: The schema of the BigQuery table does not match the recipe. Expected [String, Integer, String, etc
 
Questions:
1) Why would this error occur for the same flow?
2) Why does it not pick BigQuery running environment? Is there a setup configuration?
3) Why does it fail when running via Dataflow vs BigQuery?
 
Thank you for your help.
 
A
1 REPLY 1
nkuipers
Alteryx
Alteryx

Hi @Auggy,

 

What happens if you disable optimization [for the flow] in Dataprep, thus forcing it to run strictly on Dataflow there as well? Are the results consistent with the Composer (Orchestrator) calling Dataprep via DAG? If so, then I would infer that...

  • BQ pushdown is filtering or otherwise optimizing a schema mismatch away
  • Composer is unable to execute the job on BigQuery, though the root cause is unclear (e.g., does it have permission?)

Let us know how it goes.

 

It sounds like Dataprep is working as expected, so it might be worthwhile to start a discussion with GCP support in parallel.

 

Cheers,

 

Nathanael