Alteryx Designer Desktop Discussions

SideOfRanch · ‎08-17-2021

I followed this simple guide and python code, works fine locally. But errors on line 22 when I try to publish:

Dynamically read Zip file contents using Alteryx – Karthik's BI Musings (karthikvankadara.com)

It gets to validating and then bombs with this long error that appears to be related to using the .any() command in python.

I've had good success with similar workflows so really struggling on this one. Any ideas?

Error:

RuntimeError Traceback (most recent call last)
in
----> 1 data = Alteryx.read('#1')
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\export.py in read(incoming_connection_name, debug, **kwargs)
33 When running the workflow in Alteryx, this function will convert incoming data streams to pandas dataframes when executing the code written in the Python tool. When called from the Jupyter notebook interactively, it will read in a copy of the incoming data that was cached on the previous run of the Alteryx workflow.
34 """
---> 35 return __CachedData__(debug=debug).read(incoming_connection_name, **kwargs)
36
37
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\CachedData.py in read(self, incoming_connection_name)
304 try:
305 # get the data from the sql db (if only one table exists, no need to specify the table name)
--> 306 data = db.getData()
307 # print success message
308 print("".join(["SUCCESS: ", msg_action]))
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\Datafiles.py in getData(self, data, metadata)
498 if data is None:
499 # read in data as a list of numpy ndarrays
--> 500 data = self.connection.read_nparrays()
501 # check if data is a list of numpy structs
502 elif isinstance(data, list) and all(
RuntimeError: DataWrap2WrigleyDb::GoRecord: Attempt to seek past the end of the file
[ME_DI_20210817_ClarifyCompNetworkZipLoad.yxmd] Tool 7 - ---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
1 # we need to use .any() component as data read would be in the form
2 # of record set and we need to take one row out of it
----> 3 zip_file_path = data['FullPath'].any()
4 temp_file_path = data['TempPath'].any()
5
NameError: name 'data' is not defined

The only thing slightly different is my formula to input the temp path is called FullPath instead of just "Path".

I would upload the workflow but my organization blocks file uploads. Here is my exact python copied:

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])

#################################
# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)

#Package.installPackages(['pandas','numpy'])
from csv import reader
import pandas as pd
import os
from zipfile import ZipFile
from ayx import Alteryx

data = Alteryx.read('#1')

# we need to use .any() component as data read would be in the form 
# of record set and we need to take one row out of it
zip_file_path = data['FullPath'].any()
temp_file_path = data['TempPath'].any()

#declare variables
extensions = ['.csv']
 
#################################
# Create a ZipFile Object and load *.zip in it
with ZipFile(zip_file_path, 'r') as zipObj:
   # Extract all the contents of zip file from TempPath directory
   zipObj.extractall(temp_file_path)
 
 
#################################
# variable to hold list of paths for the extracted csv files
file_list = []
 
#loop through all the files in the directory
for root, dirnames, filenames in os.walk(temp_file_path):
    for file in filenames:
        #get file extension and name
        fname, fext = os.path.splitext(file)
 
        #parse the file only if the extension is of desired extensions
        if fext in extensions:
            #open the file and read contents
            filepath = os.path.join(root, file)
            file_list.append(filepath)
             
# get the combined data
combined_csv = pd.concat( [ pd.read_csv(f) for f in file_list ])
 
# delete the files post processing
filenames = os.listdir(temp_file_path)
for filename in filenames:
    if filename.endswith(".csv"):
        os.remove(os.path.join(temp_file_path, filename))
 
 
#################################
Alteryx.write(combined_csv, 1)

cmcclellan · ‎08-18-2021

It looks like it's failing on this line:

temp_file_path = data['TempPath'].any()

but this line was fine:

zip_file_path = data['FullPath'].any()

So do you have an input named FullPath and an input name TempPath ?

SideOfRanch · ‎08-18-2021

Yea, I'm using a Dir tool which has the FilePath column and then a formula tool to add the TempPath variable. If I browse after the formula tool, you can see all the columns just fine. It also works locally, just gives this error on the server, which is quite strange.

Alteryx Designer Desktop Discussions

Python won't validate on publish - simple script loading unzip and load CSV files

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...