I followed this simple guide and python code, works fine locally. But errors on line 22 when I try to publish:
Dynamically read Zip file contents using Alteryx – Karthik's BI Musings (karthikvankadara.com)
It gets to validating and then bombs with this long error that appears to be related to using the .any() command in python.
I've had good success with similar workflows so really struggling on this one. Any ideas?
Error:
RuntimeError Traceback (most recent call last)
in
----> 1 data = Alteryx.read('#1')
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\export.py in read(incoming_connection_name, debug, **kwargs)
33 When running the workflow in Alteryx, this function will convert incoming data streams to pandas dataframes when executing the code written in the Python tool. When called from the Jupyter notebook interactively, it will read in a copy of the incoming data that was cached on the previous run of the Alteryx workflow.
34 """
---> 35 return __CachedData__(debug=debug).read(incoming_connection_name, **kwargs)
36
37
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\CachedData.py in read(self, incoming_connection_name)
304 try:
305 # get the data from the sql db (if only one table exists, no need to specify the table name)
--> 306 data = db.getData()
307 # print success message
308 print("".join(["SUCCESS: ", msg_action]))
e:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\Datafiles.py in getData(self, data, metadata)
498 if data is None:
499 # read in data as a list of numpy ndarrays
--> 500 data = self.connection.read_nparrays()
501 # check if data is a list of numpy structs
502 elif isinstance(data, list) and all(
RuntimeError: DataWrap2WrigleyDb::GoRecord: Attempt to seek past the end of the file
[ME_DI_20210817_ClarifyCompNetworkZipLoad.yxmd] Tool 7 - ---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
1 # we need to use .any() component as data read would be in the form
2 # of record set and we need to take one row out of it
----> 3 zip_file_path = data['FullPath'].any()
4 temp_file_path = data['TempPath'].any()
5
NameError: name 'data' is not defined
The only thing slightly different is my formula to input the temp path is called FullPath instead of just "Path".
I would upload the workflow but my organization blocks file uploads. Here is my exact python copied:
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])
#################################
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
#Package.installPackages(['pandas','numpy'])
from csv import reader
import pandas as pd
import os
from zipfile import ZipFile
from ayx import Alteryx
data = Alteryx.read('#1')
# we need to use .any() component as data read would be in the form
# of record set and we need to take one row out of it
zip_file_path = data['FullPath'].any()
temp_file_path = data['TempPath'].any()
#declare variables
extensions = ['.csv']
#################################
# Create a ZipFile Object and load *.zip in it
with ZipFile(zip_file_path, 'r') as zipObj:
# Extract all the contents of zip file from TempPath directory
zipObj.extractall(temp_file_path)
#################################
# variable to hold list of paths for the extracted csv files
file_list = []
#loop through all the files in the directory
for root, dirnames, filenames in os.walk(temp_file_path):
for file in filenames:
#get file extension and name
fname, fext = os.path.splitext(file)
#parse the file only if the extension is of desired extensions
if fext in extensions:
#open the file and read contents
filepath = os.path.join(root, file)
file_list.append(filepath)
# get the combined data
combined_csv = pd.concat( [ pd.read_csv(f) for f in file_list ])
# delete the files post processing
filenames = os.listdir(temp_file_path)
for filename in filenames:
if filename.endswith(".csv"):
os.remove(os.path.join(temp_file_path, filename))
#################################
Alteryx.write(combined_csv, 1)
It looks like it's failing on this line:
temp_file_path = data['TempPath'].any()
but this line was fine:
zip_file_path = data['FullPath'].any()
So do you have an input named FullPath and an input name TempPath ?
Yea, I'm using a Dir tool which has the FilePath column and then a formula tool to add the TempPath variable. If I browse after the formula tool, you can see all the columns just fine. It also works locally, just gives this error on the server, which is quite strange.