Alteryx Designer Desktop Discussions

seven · ‎05-05-2020

There aren't many challenges to dynamic processing in Alteryx. Unfortunately, reading Excel files with unknown and changing sheet names is one such challenge.

Skip to the bottom to download the installer and exemplar.

This is a challenge I've tackled in the past using a Python script. In the current version, I used the Python tool and rewrote my script to use pandas. The logic is completely encapsulated in the Python tool. This makes this look and feel more like an out-of-the-box Alteryx tool, especially since the process is wrapped in a macro with a custom icon! In part 2, I will discuss the macro in depth. My macro is depicted below.

Simply feed a list of Excel full paths into the macro and it outputs sheet names for each file. Use the Excel Data Fetcher macro to get all of the data from the Excel files. In part 3, i will release some additional exemplars and macros.

Watch this input

turn into this output

in moments.

That's the magic! Take that output and feed it into another macro and watch all the data come out dynamically.

For a closer look at the Python script in the macro, keep reading.

I wrote and rewrote the Python script entirely from scratch. I have enclosed it below. In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.

#################################
from ayx import Package


#################################
# list all non-standard packages to be imported
from ayx import Alteryx
import pandas as pd
import openpyxl # read xlsx and xlsm
import xlrd # read xls
from pyxlsb import open_workbook # read xlsb
import warnings


#################################
# read in data from input anchor (after running the workflow)
df_in = Alteryx.read("#1")


#################################
df_out = []

# 'main' function
for row in df_in.itertuples():
    if row.function_call == 'fetch_xlsx_xlsm_sheets':
        '''
        openpyxl can throw a nuisance warning for some files 
        with headers or footers. Ignore the warning.
        '''
        warnings.filterwarnings("ignore",category=UserWarning)
        temp_workbook_object = openpyxl.load_workbook(row.FullPath)
        sheet_list = temp_workbook_object.sheetnames
    elif row.function_call == 'fetch_xls_sheets':
        temp_workbook_object = xlrd.open_workbook(row.FullPath)
        sheet_list = temp_workbook_object.sheet_names()
    elif row.function_call == 'fetch_xlsb_sheets':
        with open_workbook(row.FullPath) as temp_workbook_object:
            sheet_list = temp_workbook_object.sheets
    else:
        None
    sheet_series = pd.Series(sheet_list)
    df_out.append(pd.DataFrame({'FullPath': row.FullPath, 'SheetName': sheet_series.values}))

# create the output dataframe comprising 
# any intermediate dataframes from 'main' function
df_out = pd.concat(df_out, ignore_index=True, sort=False)


#################################
# read in metadata from input anchor
md = Alteryx.readMetadata('#1')
# drop metadata for unused field
del md['function_call']
# add metadata for new field
md['SheetName'] = {
    'name': 'SheetName', 
    'type': 'V_WString', 
    'length': 1073741823,
    'description': 'script author: Kacper Ksieski'
}
md


#################################
# write out the data with updated metadata to output 1
Alteryx.write(df_out, 1, md)

Here is the install script. Make sure to run the installer in Designer in Admin Mode.

#################################
# List all non-standard packages 
# only missing packages will be installed
from ayx import Package
Package.installPackages(['pandas','xlrd','openpyxl','pyxlsb'])

In part 2, I will discuss the macro in depth.

In part 3, i will release some additional exemplars and macros.

In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.

Stay tuned for the next posts and leave feedback! Thank you.

arjundelara · ‎04-01-2021

Hi @seven

I edit the fullpath to my directory for .xlsb input

arjundelara · ‎04-01-2021

Hi @seven

I only edited the fullpath in Macro Input and there's an error in the Python tool

Error Message

"Python (84) Traceback (most recent call last): ¶ File "C:\Users\delarar\AppData\Local\Temp\Engine_5956_af7f2689970746b7b5a287d85d9b26b5_\d9663234-4494-422a-a139-a402f7427abf\workbook.py", line 9, in <module> ¶ import openpyxl # read xlsx and xlsm ¶ModuleNotFoundError: No module named 'openpyxl' ¶
"

seven · ‎04-02-2021

Did you install openpyxl?

arjundelara · ‎04-06-2021

HI @seven ,

Not yet. How to install it openpyxl?

seven · ‎04-06-2021

The installation is mentioned at the bottom of the post. You must run the installation workflow in an elevated session.

Shalz · ‎04-06-2021

you can do this without using python for xlsx and xlsm by creating a macro such that xlsx is a direct and xlsm would be in zip extraction...

Python would be helpful in listing xls and xlsb

seven · ‎04-06-2021

Hi @Shalz

the point of this post is present a solution for ALL Excel file types. Yes, there is Python under the hood. It is abstracted away from the user. The user runs an Alteryx workflow to install dependencies and another workflow in day to day use for fetching sheet names.

arjundelara · ‎04-06-2021

Hi @seven ,

Sorry, but how run the installation workflow in an elevated session.?

Alteryx Designer Desktop Discussions

Alteryx can dynamically pull Excel sheet names!

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...