There aren't many challenges to dynamic processing in Alteryx. Unfortunately, reading Excel files with unknown and changing sheet names is one such challenge.
Skip to the bottom to download the installer and exemplar.
This is a challenge I've tackled in the past using a Python script. In the current version, I used the Python tool and rewrote my script to use pandas. The logic is completely encapsulated in the Python tool. This makes this look and feel more like an out-of-the-box Alteryx tool, especially since the process is wrapped in a macro with a custom icon! In part 2, I will discuss the macro in depth. My macro is depicted below.
Simply feed a list of Excel full paths into the macro and it outputs sheet names for each file. Use the Excel Data Fetcher macro to get all of the data from the Excel files. In part 3, i will release some additional exemplars and macros.
Watch this input
turn into this output
in moments.
That's the magic! Take that output and feed it into another macro and watch all the data come out dynamically.
For a closer look at the Python script in the macro, keep reading.
I wrote and rewrote the Python script entirely from scratch. I have enclosed it below. In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.
#################################
from ayx import Package
#################################
# list all non-standard packages to be imported
from ayx import Alteryx
import pandas as pd
import openpyxl # read xlsx and xlsm
import xlrd # read xls
from pyxlsb import open_workbook # read xlsb
import warnings
#################################
# read in data from input anchor (after running the workflow)
df_in = Alteryx.read("#1")
#################################
df_out = []
# 'main' function
for row in df_in.itertuples():
if row.function_call == 'fetch_xlsx_xlsm_sheets':
'''
openpyxl can throw a nuisance warning for some files
with headers or footers. Ignore the warning.
'''
warnings.filterwarnings("ignore",category=UserWarning)
temp_workbook_object = openpyxl.load_workbook(row.FullPath)
sheet_list = temp_workbook_object.sheetnames
elif row.function_call == 'fetch_xls_sheets':
temp_workbook_object = xlrd.open_workbook(row.FullPath)
sheet_list = temp_workbook_object.sheet_names()
elif row.function_call == 'fetch_xlsb_sheets':
with open_workbook(row.FullPath) as temp_workbook_object:
sheet_list = temp_workbook_object.sheets
else:
None
sheet_series = pd.Series(sheet_list)
df_out.append(pd.DataFrame({'FullPath': row.FullPath, 'SheetName': sheet_series.values}))
# create the output dataframe comprising
# any intermediate dataframes from 'main' function
df_out = pd.concat(df_out, ignore_index=True, sort=False)
#################################
# read in metadata from input anchor
md = Alteryx.readMetadata('#1')
# drop metadata for unused field
del md['function_call']
# add metadata for new field
md['SheetName'] = {
'name': 'SheetName',
'type': 'V_WString',
'length': 1073741823,
'description': 'script author: Kacper Ksieski'
}
md
#################################
# write out the data with updated metadata to output 1
Alteryx.write(df_out, 1, md)
Here is the install script. Make sure to run the installer in Designer in Admin Mode.
#################################
# List all non-standard packages
# only missing packages will be installed
from ayx import Package
Package.installPackages(['pandas','xlrd','openpyxl','pyxlsb'])
In part 2, I will discuss the macro in depth.
In part 3, i will release some additional exemplars and macros.
In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.
Stay tuned for the next posts and leave feedback! Thank you.
Solved! Go to Solution.
Hi @seven was there a reason you didn't make use of the wildcard xlsx tool that is available on the gallery, or build your own custom tool using the code free tools that exist in the product (with the 'input <list of sheet names>' and then a batch macro)?
I'm just trying to understand if you identified gaps in the way these tools worked and hence went the python route, or you just wanted to showcase another method of doing it?
@BenMossGreat question. I haven't seen the wildcard xlsx macro to which you refer. A search for "wildcard" or "xls" on the Gallery does not yield results that appear to be what you describe. See images below:
Why didn't I use code free tools that exist in the product? That is a limitation of Microsoft, not Alteryx. Microsoft Excel files of type (xlsx, xlsm) allow Alteryx to fetch a list of sheet names as you mentioned. Microsoft Excel files of type (xls, xlsb) do not.This is a key part of my solution and I apologise that I did not include it in my post.
As far as I know, my solution is the only one that operates on all four Microsoft Excel types (xlsx, xlsm, xls, xlsb) in a way that is seamless to the Alteryx end user.
Perfect! That's what I had hoped, I know a lot of people get frustrated with this restriction with xls files!
I'll make sure to bookmark this for those cases!
Ben
New post, part 2!
Click here.
New post, part 3!
Click here.
New post, part 4!
Click here.
Could you please suggest on how we can embed this in Crew macro ? when I tried to embed in crew runner macro, it is failing .
Hi,
Why I'm getting a "Traceback Error in the Python tool"
Thanks.
Impossible to understand without screenshots, examples, more information, what did you do, how did you set it up, etc.