Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.

Parquet files being generated on Alteryx Server

mkosmicki1
8 - Asteroid

Hi there, I'm hoping the collective power of the community can help me solve a mystery.

I have a scheduled workflow that is pulling in data from a Redshift server and writing the data out to a YXDB file. Sometime in September 2023, this process started generating parquet files on the Alteryx server, taking up a massive amount of space and nearly crashing the Alteryx server a few weeks ago. 

This workflow has been running for several years with no issues. I am trying to figure out why it is suddenly creating parquet files and storing them on the Alteryx server.

Yes....I am well aware that Alteryx doesn't support parquet files. Yet this workflow is causing the parquet files to be created and stored on Alteryx server. 

Yes....I have asked Alteryx support about the issue. All Support could tell me (repeatedly) was that Alteryx doesn't support parquet files. A rather confusing answer, considering the Server is creating the parquet files.

 

Has anyone else had this happen? I can always stop the workflow from running. But that doesn't give me an explanation as to why the parquet files are being created.

3 REPLIES 3
fmvizcaino
17 - Castor
17 - Castor

Hi @mkosmicki1 ,

 

Unless something very weird is going on, Alteryx doesn`t create parquet files as part of its internal processes.

 

Where are these files being saved? Please share the path.

 

How is the workflow pulling data from Redshift? If you can take a screenshot of your workflow, that would be helpful, even better if you could share the workflow.

 

Best,

Fernando Vizcaino

apathetichell
18 - Pollux

Is the Redshift cluster pulling data from S3 buckets? Are the files natively Parquet in the S3 bucket? How are the files being migrated from Redshift to Alteryx? Is there an S3 integration being used - and if so does it use the Alteryx native S3 tool?

 

My hunch:

Hypothesis 1: something changed with the underlying files in the S3 (or with Redshift) - they are now in Parquet - something in your workflow is brining in these files directly from the S3.

Hypothesis 2: there was a Server update. One of the tools used (S3/Redshift/something) was written in such a way where it is looking at the underlying files and is now providing an issue.

Hypothesis 3: Some Redshift cluster update (maybe to serverless?) which has had a downstream effect.

 

I'm totally curious - so feel free to reach out via DM.

mkosmicki1
8 - Asteroid

After more investigation, it appears that the Tableau Output tool (that writes to Tableau Cloud using PAT) was creating the parquet files. They were e being deposited on the C drive of our Alteryx server under our user admin\App Data\Temp\Local location. 

I loaded one of the Parquet files into our Snowflake instance to see what was in the file and discovered that it was data I was writing out to TableauCloud. This prompted me to remove two scheduled workflows that were writing to Tableau Cloud. Since removing these two workflows, we have not had any Parquet files created on Alteryx

Server. 

We are using Tableau_v1.3.1.yxi as our write to TableauCloud tool. Now we only run workflows using this tool from desktop and any scheduled workflows write to Snowflake, with an extract refresh to TableauCloud. 

So again, not a Redshift issue. A tool issue.