Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Alteryx can dynamically pull Excel sheet names!

seven
12 - Quasar

There aren't many challenges to dynamic processing in Alteryx. Unfortunately, reading Excel files with unknown and changing sheet names is one such challenge.

 

Skip to the bottom to download the installer and exemplar.

 

This is a challenge I've tackled in the past using a Python script. In the current version, I used the Python tool and rewrote my script to use pandas. The logic is completely encapsulated in the Python tool. This makes this look and feel more like an out-of-the-box Alteryx tool, especially since the process is wrapped in a macro with a custom icon! In part 2, I will discuss the macro in depth. My macro is depicted below.

seven_1-1588716550673.png

Simply feed a list of Excel full paths into the macro and it outputs sheet names for each file. Use the Excel Data Fetcher macro to get all of the data from the Excel files. In part 3, i will release some additional exemplars and macros.

 

seven_0-1588716088165.png

Watch this input

seven_5-1588717672126.png

turn into this output

seven_4-1588717640294.png

in moments.

 

That's the magic! Take that output and feed it into another macro and watch all the data come out dynamically.

 

For a closer look at the Python script in the macro, keep reading.

I wrote and rewrote the Python script entirely from scratch. I have enclosed it below. In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.

#################################
from ayx import Package


#################################
# list all non-standard packages to be imported
from ayx import Alteryx
import pandas as pd
import openpyxl # read xlsx and xlsm
import xlrd # read xls
from pyxlsb import open_workbook # read xlsb
import warnings


#################################
# read in data from input anchor (after running the workflow)
df_in = Alteryx.read("#1")


#################################
df_out = []

# 'main' function
for row in df_in.itertuples():
    if row.function_call == 'fetch_xlsx_xlsm_sheets':
        '''
        openpyxl can throw a nuisance warning for some files 
        with headers or footers. Ignore the warning.
        '''
        warnings.filterwarnings("ignore",category=UserWarning)
        temp_workbook_object = openpyxl.load_workbook(row.FullPath)
        sheet_list = temp_workbook_object.sheetnames
    elif row.function_call == 'fetch_xls_sheets':
        temp_workbook_object = xlrd.open_workbook(row.FullPath)
        sheet_list = temp_workbook_object.sheet_names()
    elif row.function_call == 'fetch_xlsb_sheets':
        with open_workbook(row.FullPath) as temp_workbook_object:
            sheet_list = temp_workbook_object.sheets
    else:
        None
    sheet_series = pd.Series(sheet_list)
    df_out.append(pd.DataFrame({'FullPath': row.FullPath, 'SheetName': sheet_series.values}))

# create the output dataframe comprising 
# any intermediate dataframes from 'main' function
df_out = pd.concat(df_out, ignore_index=True, sort=False)


#################################
# read in metadata from input anchor
md = Alteryx.readMetadata('#1')
# drop metadata for unused field
del md['function_call']
# add metadata for new field
md['SheetName'] = {
    'name': 'SheetName', 
    'type': 'V_WString', 
    'length': 1073741823,
    'description': 'script author: Kacper Ksieski'
}
md


#################################
# write out the data with updated metadata to output 1
Alteryx.write(df_out, 1, md)

 

Here is the install script. Make sure to run the installer in Designer in Admin Mode.

#################################
# List all non-standard packages 
# only missing packages will be installed
from ayx import Package
Package.installPackages(['pandas','xlrd','openpyxl','pyxlsb'])

 

In part 2, I will discuss the macro in depth.

In part 3, i will release some additional exemplars and macros.

In part 4, I will delve further into the Python script, why I made certain design choices, and some performance metrics.

 

Stay tuned for the next posts and leave feedback! Thank you.

 

17 REPLIES 17
arjundelara
8 - Asteroid

Hi @seven 

 

arjundelara_0-1617332226971.png

I edit the fullpath to my directory for .xlsb input

arjundelara
8 - Asteroid

Hi @seven 

 

I only edited the fullpath in Macro Input and there's an error in the Python tool 

 

Error Message

"Python (84) Traceback (most recent call last): ¶ File "C:\Users\delarar\AppData\Local\Temp\Engine_5956_af7f2689970746b7b5a287d85d9b26b5_\d9663234-4494-422a-a139-a402f7427abf\workbook.py", line 9, in <module> ¶ import openpyxl # read xlsx and xlsm ¶ModuleNotFoundError: No module named 'openpyxl' ¶
"

seven
12 - Quasar

Did you install openpyxl?

arjundelara
8 - Asteroid

HI @seven ,

 

Not yet. How to install it openpyxl?

seven
12 - Quasar

The installation is mentioned at the bottom of the post. You must run the installation workflow in an elevated session.

Shalz
8 - Asteroid

 you can do this without using python for xlsx and xlsm by creating a macro such that xlsx is a direct and xlsm would be in zip extraction...

Python would be helpful in listing xls and xlsb

seven
12 - Quasar

Hi @Shalz 

 

the point of this post is present a solution for ALL Excel file types. Yes, there is Python under the hood. It is abstracted away from the user. The user runs an Alteryx workflow to install dependencies and another workflow in day to day use for fetching sheet names.

arjundelara
8 - Asteroid

Hi @seven ,

 

Sorry, but how run the installation workflow in an elevated session.?

Labels