Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

A pandas dataframe is required for passing data to outgoing connections in Alteryx

wonka1234
10 - Fireball

Hi,

 

Is there a limitation to the dataframe size using Alteryx.write(df,1) ? 

I cant seem to write my dataframe to an anchor.

 

my code uses zipfile and read excel to get data.. and for some reason it is not reading in as a dataframe.. but it clearly is when i run it locally.

 

Is there anyway to convert it again to ensure i have it as a dataframe in alteryx?

12 REPLIES 12
apathetichell
19 - Altair

Share your python code.

wonka1234
10 - Fireball

@apathetichell 

 

from ayx import Alteryx


import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True)

#print(df)

Alteryx.write(df, 1)

Felipe_Ribeir0
16 - Nebula

Hi @wonka1234 

 

What do you see when you print what you called 'df'?


df = process_files(month_to_process)

print(df)

wonka1234
10 - Fireball

@Felipe_Ribeir0   ah, i get "None" when i print df.. sigh not sure where it is not being converted..

Felipe_Ribeir0
16 - Nebula

@wonka1234 so this is your problem, if df is not a dataframe you cannot use this piece of code: Alteryx.write(df, 1)

 

Go back on your code and see what is missing and be sure that df is a dataframe and it will work.

 

 

apathetichell
19 - Altair

did you import pandas as pd?

wonka1234
10 - Fireball

@apathetichell  yes it is imported. 

@Felipe_Ribeir0 

 

So in my below code I can print the DF fine.

 

How about why this code isnt working?

 

from ayx import Alteryx


import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True)

#print(df)

Alteryx.write(df, 1)

 

and getting this huge error:

 

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-2-d532070a98dc> in <module>
     19 #print(df)
     20 
---> 21 Alteryx.write(df, 1)

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\export.py in write(pandas_df, outgoing_connection_number, columns, debug, **kwargs)
     85     When running the workflow in Alteryx, this function will convert a pandas data frame to an Alteryx data stream and pass it out through one of the tool's five output anchors. When called from the Jupyter notebook interactively, it will display a preview of the pandas dataframe. An optional 'columns' argument allows column metadata to specify the field type, length, and name of columns in the output data stream.
     86     """
---> 87     return __CachedData__(debug=debug).write(     88         pandas_df, outgoing_connection_number, columns=columns, **kwargs
     89     )

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\CachedData.py in write(self, pandas_df, outgoing_connection_number, columns, output_filepath)
    426 
    427         for index, colname in enumerate(pandas_df.columns):
--> 428             coltype = str(pandas_df.dtypes[index])
    429             # does the column contain bytearrays? then its probably a blob
    430             # (check only first non-null value in column -- tradeoff for efficiency)

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    880 
    881         elif key_is_scalar:
--> 882             return self._get_value(key)
    883 
    884         if (

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    989 
    990         # Similar to Index.get_value, but we do not fall back to positional
--> 991         loc = self.index.get_loc(label)
    992         return self.index._get_values_for_loc(self, loc, label)
    993 

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 0

 

Felipe_Ribeir0
16 - Nebula

@wonka1234 Yes, you wont be able to use Alteryx.write(df, 1) unless df is a dataframe. This is what this error is saying. What you called df is not a dataframe (as you saw trying to print), so it is a good idea to go back into your code and see why.

apathetichell
19 - Altair

try something like:

 


import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.DataFrame(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True))

#print(df)

Alteryx.write(df, 1)

basically you need to manually convert something into a dataframe... so that pd.DataFrame() function is key.
 

 

Labels
Top Solution Authors