Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Save Excel files in loop using Python and Alteryx

dbuz
7 - Meteor

Hi everyone,

 

I'm working on an Alteryx project where my input files are compressed ("gzip" format) inside which I have files without extensions. Instead of unzipping them, opening them with Excel and saving them as a worksheet manually, I figured Python could do the job for me.

However, I am not able to use some Python libraries that would easily save those files in Excel format using Python because that would require running Alteryx as an administrator and my company does not allow me to do that.

Therefore, I found a way to have Python loop over the compressed files and to push them into the Alteryx workflow, but since there are more than one file, Alteryx only saves the last one in the pre-determined path.

Would you have any ideas how to fix this? 

FYI - I am not allowed to build macros, so I have to use only Python code or Alteryx tools.

Thanks in advance!

 

Python - Alteryx WF.PNG

 

 

4 REPLIES 4
PhilipMannering
16 - Nebula
16 - Nebula

Have you tried reading in the GZip file directly? Apparently it's supported,

PhilipMannering_0-1642442303126.png

 

dbuz
7 - Meteor

Thanks for your reply, @PhilipMannering. I did not know it was possible to read the GZip files directly. 

But out of curioisity, if I wanted to use Python, would there be a way?

PhilipMannering
16 - Nebula
16 - Nebula

Hey @dbuz 

 

One way is to create an empty list and for every iteration add the new dataframe to that list. Then you end up with a list of DataFrames. You can run pd.concat(df_list) to union all those DataFrames.

 

df_list = [] #NEW CODE

inputPath = path['Path'][0]
fileExt = r".gz"
for item in os.listdir(inputPath): # loop through items in dir
    if item.endswith(fileExt): # check for ".gz" extension
        full_path = os.path.join(inputPath, item) # get full path of files
        
        df = pd.read_csv(full_path, compression='gzip', header=0, sep='|', encoding="ISO-8859-1") #from gzip to pandas df
        
        full_path = full_path.replace('.gz', '.xlsx') + "|Sheet1"
        df.loc[ : , 'path'] = full_path       
        
        df_list.append(df) #NEW CODE
        

df = pd.concat(df_list) #NEW CODE
Alteryx.write(df, 1) # WRITE NEW DATAFRAME OUTSIDE OF FOR LOOP
dbuz
7 - Meteor

Thanks a lot @PhilipMannering. I agree, the best solution would be to concatenate the dataframes in Python and then work on them with Alteryx.

Labels