I have a workflow that is currently run from a csv file but the incoming file is set to change to a parquet file in a month or so. Is there a way to take that parquet file and convert it to a csv file? I have read through the "Will It Alteryx" article about using parquet files in an Hadoop connection but this is different b/c it's being sent as a file we won't actually be connecting to Hive/Hadoop at all. I appreciate you guys looking at this question.
Hi @knobsdog,
I found two topics on about this on community:
https://community.alteryx.com/t5/Engine-Works/Parquet-will-it-Alteryx/ba-p/423156
Probably it is worth to visit them.
Not sure why my reply didn't stick but I'll send it again. I've looked through these articles but they don't address what I'm looking for. I am not connecting to Hadoop/Hive and pulling down parquet files. I have a vendor sending me data in a parquet file format via email, similar to if they emailed me a csv file, and I need to convert it to csv. I'm not sure if Alteryx can do that or not but I'm hoping someone has used it for this before.
Hi @knobsdog,
Ok I got it now.
This article mentioned a scenario like this
https://russellchristopher.com/alteryx-and-parquet-sure-why-not/
There is a link to a github with information that this workflow will help you:
https://github.com/russch/alteryx-parquet/blob/master/parquet_to_csv.yxmd
I didn't download any workflow from GitHub for a long time but if I remember correctly if you will download it the workflow should work.
Hi @knobsdog
You may try pandas' read_parquet (https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html) and to_parquet (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html) functions. They should work for your use case.