Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Directory tool to bring multiple files into Python

VP29
6 - Meteoroid

Hello Community, 

 

Here is a probably very basic question that I have been breaking my head over for some time now:

I have a folder with 25 ".pickle" files which I want to read in with the Python tool, however I am struggling to bring in all the data. My idea is to use the Directory tool, take the 'FullPath' column and then call all 25 files using python. Apparently, there are some underlying columns which have the same header and cause some havoc so i put in another section to rename the columns as advised here: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Alteryx-write-error-quot-input-must-be... by @Ross_K  

 

I use the below code which runs without error BUT it only brings back one out of the 25 files. I am sure the solution is very simple but I haven't uncovered it yet as I am not a seasoned python user.

 

 

 

from ayx import Alteryx
import pandas as pd
import numpy as np


altydata = Alteryx.read("#1")
file_list= altydata['FullPath'].to_list()

  
main_dataframe = pd.DataFrame(pd.read_pickle(file_list[0]))
  
for i in range(1,len(file_list)):
    data = pd.read_pickle(file_list[i])
    df = pd.DataFrame(data)
    main_dataframe = pd.concat([main_dataframe,df],axis=1)

    # Get the full column list
columns_all = main_dataframe.columns
    # Use a set to get the unique column names
columns_unique = set(columns_all)
    # If the two match in length, there are no duplicates, return the df
if len(columns_all) == len(columns_unique):
    main_dataframe
    # If the lengths differ, there is at least one duplicate
    # Create a dictionary of those unique names to keep track of their counts
columns_unique_counter = {}
for column in columns_unique:
        columns_unique_counter[column] = 0
    # Loop through the columns in the df
    # Add the column names (renamed or not) to a new list
columns_new = []
for name in columns_all:
    name_counter = columns_unique_counter[name]
    if name_counter != 0:
            # This column name has been seen before
            # Rename the column in the list
        columns_new.append(f"{name}.{name_counter}")
    else:
        columns_new.append(name)
        # Whether the column was renamed or not, increment the counter
    columns_unique_counter[name] = columns_unique_counter[name] + 1
    # Set the new column names
main_dataframe.columns = columns_new
#main_dataframe
#print(main_dataframe.head(10))
Alteryx.write(main_dataframe, 1)

 

 

 
Thanks in advance!

3 REPLIES 3
apathetichell
19 - Altair

Lots of people here are more knowledgeable on Python than me - but this looks reasonable. Can you confirm that all the data is going in? ie if you add:

 

print(len(file_list))

 

does it look reasonable?

 

isn't len(df) acceptable - ie do you have to convert to list?

VP29
6 - Meteoroid

Yeah, I tested that, the list is complete.

I thought to convert to a list because when I was just using the column 'FullPath' only the first result came back - hence why I thought to convert to a list.

apathetichell
19 - Altair

o.k. crazy question - can you try print(main_dataframe) after your loop?  random hypothesis - your loop works - you read in the files but something else goes wrong further downstream.

 


from ayx import Alteryx
import pandas as pd
import numpy as np


altydata = Alteryx.read("#1")
file_list= altydata['FullPath'].to_list()



for i in range(1,len(file_list)):
print(file_list[i])

 

 

will get you a list of files.

Labels