There aren't many challenges to dynamic processing in Alteryx. Unfortunately, reading Excel files with unknown and changing sheet names is one such challenge.
Skip to the bottom to download the installer and exemplar.
This is a challenge I've tackled in the past using a Python script. In the current version, I used the Python tool and rewrote my script to use pandas. The logic is completely encapsulated in the Python tool. This makes this look and feel more like an out-of-the-box Alteryx tool, especially since the process is wrapped in a macro with a custom icon! In part 2, I will discuss the macro in depth. My macro is depicted below.
Simply feed a list of Excel full paths into the macro and it outputs sheet names for each file. Use the Excel Data Fetcher macro to get all of the data from the Excel files. In part 3, i will release some additional exemplars and macros.
Watch this input
turn into this output
in moments.
That's the magic! Take that output and feed it into another macro and watch all the data come out dynamically.
For a closer look at the Python script in the macro, keep reading.
I wrote and rewrote the Python script entirely from scratch. I have enclosed it below. In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.
#################################
from ayx import Package
#################################
# list all non-standard packages to be imported
from ayx import Alteryx
import pandas as pd
import openpyxl # read xlsx and xlsm
import xlrd # read xls
from pyxlsb import open_workbook # read xlsb
import warnings
#################################
# read in data from input anchor (after running the workflow)
df_in = Alteryx.read("#1")
#################################
df_out = []
# 'main' function
for row in df_in.itertuples():
if row.function_call == 'fetch_xlsx_xlsm_sheets':
'''
openpyxl can throw a nuisance warning for some files
with headers or footers. Ignore the warning.
'''
warnings.filterwarnings("ignore",category=UserWarning)
temp_workbook_object = openpyxl.load_workbook(row.FullPath)
sheet_list = temp_workbook_object.sheetnames
elif row.function_call == 'fetch_xls_sheets':
temp_workbook_object = xlrd.open_workbook(row.FullPath)
sheet_list = temp_workbook_object.sheet_names()
elif row.function_call == 'fetch_xlsb_sheets':
with open_workbook(row.FullPath) as temp_workbook_object:
sheet_list = temp_workbook_object.sheets
else:
None
sheet_series = pd.Series(sheet_list)
df_out.append(pd.DataFrame({'FullPath': row.FullPath, 'SheetName': sheet_series.values}))
# create the output dataframe comprising
# any intermediate dataframes from 'main' function
df_out = pd.concat(df_out, ignore_index=True, sort=False)
#################################
# read in metadata from input anchor
md = Alteryx.readMetadata('#1')
# drop metadata for unused field
del md['function_call']
# add metadata for new field
md['SheetName'] = {
'name': 'SheetName',
'type': 'V_WString',
'length': 1073741823,
'description': 'script author: Kacper Ksieski'
}
md
#################################
# write out the data with updated metadata to output 1
Alteryx.write(df_out, 1, md)
Here is the install script. Make sure to run the installer in Designer in Admin Mode.
#################################
# List all non-standard packages
# only missing packages will be installed
from ayx import Package
Package.installPackages(['pandas','xlrd','openpyxl','pyxlsb'])
In part 2, I will discuss the macro in depth.
In part 3, i will release some additional exemplars and macros.
In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.
Stay tuned for the next posts and leave feedback! Thank you.
Solved! Go to Solution.
Hi @seven
I only edited the fullpath in Macro Input and there's an error in the Python tool
Error Message
"Python (84) Traceback (most recent call last): ¶ File "C:\Users\delarar\AppData\Local\Temp\Engine_5956_af7f2689970746b7b5a287d85d9b26b5_\d9663234-4494-422a-a139-a402f7427abf\workbook.py", line 9, in <module> ¶ import openpyxl # read xlsx and xlsm ¶ModuleNotFoundError: No module named 'openpyxl' ¶
"
Did you install openpyxl?
The installation is mentioned at the bottom of the post. You must run the installation workflow in an elevated session.
you can do this without using python for xlsx and xlsm by creating a macro such that xlsx is a direct and xlsm would be in zip extraction...
Python would be helpful in listing xls and xlsb
Hi @Shalz
the point of this post is present a solution for ALL Excel file types. Yes, there is Python under the hood. It is abstracted away from the user. The user runs an Alteryx workflow to install dependencies and another workflow in day to day use for fetching sheet names.