Alteryx Designer Desktop Discussions

wonka1234 · ‎11-29-2022

Hi,

Is there a limitation to the dataframe size using Alteryx.write(df,1) ?

I cant seem to write my dataframe to an anchor.

my code uses zipfile and read excel to get data.. and for some reason it is not reading in as a dataframe.. but it clearly is when i run it locally.

Is there anyway to convert it again to ensure i have it as a dataframe in alteryx?

apathetichell · ‎11-29-2022

Share your python code.

wonka1234 · ‎11-30-2022

@apathetichell

from ayx import Alteryx

import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True)

#print(df)

Alteryx.write(df, 1)

Felipe_Ribeir0 · ‎11-30-2022

Hi @wonka1234

What do you see when you print what you called 'df'?

df = process_files(month_to_process)

print(df)

wonka1234 · ‎11-30-2022

@Felipe_Ribeir0 ah, i get "None" when i print df.. sigh not sure where it is not being converted..

Felipe_Ribeir0 · ‎11-30-2022

@wonka1234 so this is your problem, if df is not a dataframe you cannot use this piece of code: Alteryx.write(df, 1)

Go back on your code and see what is missing and be sure that df is a dataframe and it will work.

apathetichell · ‎11-30-2022

did you import pandas as pd?

wonka1234 · ‎11-30-2022

@apathetichell yes it is imported.

@Felipe_Ribeir0

So in my below code I can print the DF fine.

How about why this code isnt working?

from ayx import Alteryx

import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True)

#print(df)

Alteryx.write(df, 1)

and getting this huge error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-2-d532070a98dc> in <module>
     19 #print(df)
     20 
---> 21 Alteryx.write(df, 1)

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\export.py in write(pandas_df, outgoing_connection_number, columns, debug, **kwargs)
     85     When running the workflow in Alteryx, this function will convert a pandas data frame to an Alteryx data stream and pass it out through one of the tool's five output anchors. When called from the Jupyter notebook interactively, it will display a preview of the pandas dataframe. An optional 'columns' argument allows column metadata to specify the field type, length, and name of columns in the output data stream.
     86     """
---> 87     return __CachedData__(debug=debug).write(     88         pandas_df, outgoing_connection_number, columns=columns, **kwargs
     89     )

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\CachedData.py in write(self, pandas_df, outgoing_connection_number, columns, output_filepath)
    426 
    427         for index, colname in enumerate(pandas_df.columns):
--> 428             coltype = str(pandas_df.dtypes[index])
    429             # does the column contain bytearrays? then its probably a blob
    430             # (check only first non-null value in column -- tradeoff for efficiency)

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    880 
    881         elif key_is_scalar:
--> 882             return self._get_value(key)
    883 
    884         if (

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    989 
    990         # Similar to Index.get_value, but we do not fall back to positional
--> 991         loc = self.index.get_loc(label)
    992         return self.index._get_values_for_loc(self, loc, label)
    993 

c:\program files\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 0

Felipe_Ribeir0 · ‎11-30-2022

@wonka1234 Yes, you wont be able to use Alteryx.write(df, 1) unless df is a dataframe. This is what this error is saying. What you called df is not a dataframe (as you saw trying to print), so it is a good idea to go back into your code and see why.

apathetichell · ‎11-30-2022

try something like:

import pandas as pd
import zipfile
xlrd = Alteryx.importPythonModule("C:\\Users\\user\\.conda\\envs\\Python\\Lib\\site-packages\\xlrd")

archive = zipfile.ZipFile(r'O:\Alteryx\Community\Original - All fields - October 2022.zip')
xlfile = archive.open('Original - All fields - October 2022.xls')

df = pd.concat(pd.DataFrame(pd.read_excel(xlfile, header = 1, sheet_name=None), ignore_index=True))

#print(df)

Alteryx.write(df, 1)

basically you need to manually convert something into a dataframe... so that pd.DataFrame() function is key.

Alteryx Designer Desktop Discussions

A pandas dataframe is required for passing data to outgoing connections in Alteryx