Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Python Tool - Dataframe with Keys, Exporting to Excel Workbook Worksheets

cbolin
7 - Meteor

Hi,

 

I'm using Alteryx Admin Designer, Version 2020.0.5.25447.

 

I ported all my Python code into the Python tool. The code works in Alteryx.

  1. Loops through nested folders to build list of file paths meeting criteria.
  2. From each file path I extract two pieces of information: station number and a folder name E.g. “stat_560’ and “B100”.
  3. Code loops through all files, locating those related to the two criteria above.
  4. Parses all the data files related to the 'station' and 'folder' into a Python dictionary object. Concatenates station and folder to produce a sheet name (e.g. stat560 and B100 become 'stat560_B100'. Adds sheet name to the dictionary object.
  5. Converts list of dictionaries to panda dataframe (See attached file 'screenshot.png'.

Primary Method to Export (FAILED)

 

In the original code, I used a pandas function called ExcelWriter (from pandas import ExcelWriter). The code created a single Excel workbook, and placed data into each sheet using the 'sheetname' in the dataframe. There should be about 50 worksheets after the data is subdivided.

 

This line threw an error stating “openpyxl” did not exist. On another machine I had to “pip install openpyxl” to correct this.  I added this line to my code: Package.installPackages(package="openpyxl")

 

It failed, it was late, so I stopped.

 

Is there another way to install this 'openpyxl'? Is it supported by this version of Alteryx?

 

Alternate Method (I'm not certain how to do this)

0. Create empty Excel file.

1. Iterate through the Python Tool output file.

2. Extract data from this output using the field 'Sheetname'.

3. Write each extract to a worksheet and rename that sheet to the field 'Sheetname'.

4. Repeat for all ~50 datasets so there is a single Workbook with about 50 sheets.

 

I'm looking for a solution to either method.

 

Regards,

Chuck

 

5 REPLIES 5
morr-co
10 - Fireball

Hi @cbolin - I'm not sure if this is the issue (and you may have tried this already) but I think it could be related to your install line: Package.installPackages(package="openpyxl"). I believe the package names have to be passed as a list. So, it would be:

 

Package.installPackages(['openpyxl'])

 

Alternatively, if I'm understanding the process correctly, I don't see anything that you couldn't accomplish using native Alteryx tools and making a few macros. High level, this might look like:

 

  1. Use a Directory input tool to read the folder and subfolder contents
  2. Use a Filter tool to limit the files in the Directory output to those that match your naming criteria.
  3. Create a batch macro that accepts a file path and reads the sheet names in the file
  4. Create batch macro that will read the data from a specified excel file and sheet
cbolin
7 - Meteor

Hi,

Thanks for your input. I passed the value as a string and also within a list, producing the same error messages.

 

I agree that Alteryx native tools can do a lot for my solution. Regarding Macros, I am unfamiliar with writing macros inside Alteryx. Is this a scripting language that can be recorded and/or written like VBA in Excel?

 

In regards to your four steps, there is a step 2.5 that requires extracting keys and values from data files that uses indentation (tabs) to define the structure. There are 11 kinds of data files, corresponding to each of the stations. An Alteryx technical person created a workflow to extract two of the many pieces of information. However, extracting all of the data would have required an unnecessarily complex workflow of RegEx expressions. I calculated 800+ to deal with the 11 variations. (NOTE: I'm an Alteryx neophyte so I make be speaking out of my backside).

 

Writing my response to you has sparked an idea or two. I'll make another posting.

 

Thanks. Please share a link to macro coding examples.

 

 

morr-co
10 - Fireball

Hi @cbolin :

 

The good news is that macros are just another workflow that you create - no code required. Once you've created the workflow, they can be used in other workflows like any other tool. I'd say it's almost like creating a method when you are trying to encapsulate a task that you do a lot. Macros can also be configured to loop if you want to dynamically pass variables (i.e. a filename or sheet name) and have the operations re-run. There are 3 main types of macros and I've included links to 2 videos that helped me understand them.

 

  • Standard Macro (no looping, only run once)
  • Batch Macro (for loop)
  • Iterative Macro (while loop)
  • Technically there is a 4th macro for location optimization. From what I've read, this is more of a "legacy" feature and the recommendation is to build with iterative macros.

I have built macros to execute different operations based on the file type but it sounds like this could be a lot of additional effort given where you are at in your project. Maybe a more incremental approach would be to convert pieces of the workflow to a macro while still running the Python scripts to parse the data; it may also be food for thought for another day. 

 

I also attempted to install the openpyxl module in my designer but encountered the same error. Unfortunately, I don't have any suggestions there. I think there may be a dependency that isn't currently supported. I have had a lot of struggles with the Python tool and managing the VM which is why I try to convert a lot of it to macros or things that use the Alteryx tools. Would love to hear the solution if you are able to find one. 

cbolin
7 - Meteor

Hi,

Thanks for the detailed explanation of macros. I will look at the videos because I am already duplicating portions of workflows. You answered a concern by dynamic passing of variables.

 

Macros offer the possibility of chaining RegEx expressions in the indented data file structure.

 

I appreciate your efforts to replicate my failure with the "openpyxl" file.  About that, I  got it working within the past couple of hours. I'll post the solution as another reply.

 

Thanks for your support!

 

 

cbolin
7 - Meteor

I'm embarrassed to address my mistake. But here it is...

 

I run Python on my local machine using Conda and virtual environments. This slipped my mind while using Alteryx Designer as I tried executing this line of code:

 

      "Package.installPackages(package=['openpyxl'])"

 

This link provides the information and the one hint I overlooked.

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/How-To-Use-Alteryx-installPackages-...

 

Once I received admin rights I ran AD as an administrator and this dependency for  pandas ExcelWriter worked perfectly.  BTW, I'm including some Python code to show how I used this function.

 

with ExcelWriter(excel_path) as writer:    
    for k, v in df_dic.items():
        sheetname = k[0].upper() + k[1][0].upper()
        data = pd.DataFrame(df_dic[k])
        if data.shape[0] > 0:
            data = data.T        
            data.columns = data.loc['typedata',:]
            data.drop('typedata',axis=0, inplace=True)
            data.columns.name = ""
            data.to_excel(writer, sheetname) # data.T.to_excel(writer, sheetname)
    writer.save()      

 

 

 

Labels