Directory tool to bring multiple files into Python
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello Community,
Here is a probably very basic question that I have been breaking my head over for some time now:
I have a folder with 25 ".pickle" files which I want to read in with the Python tool, however I am struggling to bring in all the data. My idea is to use the Directory tool, take the 'FullPath' column and then call all 25 files using python. Apparently, there are some underlying columns which have the same header and cause some havoc so i put in another section to rename the columns as advised here: https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Alteryx-write-error-quot-input-must-be... by @Ross_K
I use the below code which runs without error BUT it only brings back one out of the 25 files. I am sure the solution is very simple but I haven't uncovered it yet as I am not a seasoned python user.
from ayx import Alteryx
import pandas as pd
import numpy as np
altydata = Alteryx.read("#1")
file_list= altydata['FullPath'].to_list()
main_dataframe = pd.DataFrame(pd.read_pickle(file_list[0]))
for i in range(1,len(file_list)):
data = pd.read_pickle(file_list[i])
df = pd.DataFrame(data)
main_dataframe = pd.concat([main_dataframe,df],axis=1)
# Get the full column list
columns_all = main_dataframe.columns
# Use a set to get the unique column names
columns_unique = set(columns_all)
# If the two match in length, there are no duplicates, return the df
if len(columns_all) == len(columns_unique):
main_dataframe
# If the lengths differ, there is at least one duplicate
# Create a dictionary of those unique names to keep track of their counts
columns_unique_counter = {}
for column in columns_unique:
columns_unique_counter[column] = 0
# Loop through the columns in the df
# Add the column names (renamed or not) to a new list
columns_new = []
for name in columns_all:
name_counter = columns_unique_counter[name]
if name_counter != 0:
# This column name has been seen before
# Rename the column in the list
columns_new.append(f"{name}.{name_counter}")
else:
columns_new.append(name)
# Whether the column was renamed or not, increment the counter
columns_unique_counter[name] = columns_unique_counter[name] + 1
# Set the new column names
main_dataframe.columns = columns_new
#main_dataframe
#print(main_dataframe.head(10))
Alteryx.write(main_dataframe, 1)
Thanks in advance!
- Labels:
- Python
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Lots of people here are more knowledgeable on Python than me - but this looks reasonable. Can you confirm that all the data is going in? ie if you add:
print(len(file_list))
does it look reasonable?
isn't len(df) acceptable - ie do you have to convert to list?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yeah, I tested that, the list is complete.
I thought to convert to a list because when I was just using the column 'FullPath' only the first result came back - hence why I thought to convert to a list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
o.k. crazy question - can you try print(main_dataframe) after your loop? random hypothesis - your loop works - you read in the files but something else goes wrong further downstream.
from ayx import Alteryx
import pandas as pd
import numpy as np
altydata = Alteryx.read("#1")
file_list= altydata['FullPath'].to_list()
for i in range(1,len(file_list)):
print(file_list[i])
will get you a list of files.
