I have multiple datasets with column headers in Japanese. I ran the headers through Google Translate and put the original and translated header values in a spreadsheet so that when we receive updated datasets in the future, I can use this as a template to easily find and replace headers with the translation. I created what I thought was a pretty straightforward Python script to load each dataset into a dataframe, union all of the dataframes based on the common set of fields between them, and then rename the headers with the translations.
from ayx import Alteryx
import pandas as pd
# Import data & translation template
df_2401_2t = pd.read_excel('**filepath + sheet name**')
df_2401_4t = pd.read_excel('**filepath + sheet name**')
df_2601 = pd.read_excel('**filepath + sheet name**')
df_trans = pd.read_excel('**filepath + sheet name**')
frames = [df_2401_2t, df_2401_4t, df_2601]
# Union df on common field names
df = pd.concat(frames, join = 'inner', ignore_index = True)
# Find & replace headers with translations
for col in df.columns:
for i in df_trans.index:
if col == df_trans['Original'][i]:
df.rename({col: df_trans['Translation'][i]}, axis = 1, inplace = True)
# Output df to workflow
Alteryx.write(df, 1)However I'm getting the following error when I try to write df back out to Alteryx:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-775ca6c91598> in <module>
----> 1 Alteryx.write(df, 1)
e:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\export.py in write(pandas_df, outgoing_connection_number, columns, debug, **kwargs)
86 """
87 return __CachedData__(debug=debug).write(
---> 88 pandas_df, outgoing_connection_number, columns=columns, **kwargs
89 )
90
e:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\CachedData.py in write(self, pandas_df, outgoing_connection_number, columns, output_filepath)
430 # (check only first non-null value in column -- tradeoff for efficiency)
431 col_contains_bytearrays = coltype == "object" and isinstance(
--> 432 firstValidValue(pandas_df[colname]), (bytearray, bytes)
433 )
434 try:
e:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\DataUtils.py in firstValidValue(pd_series)
46 def firstValidValue(pd_series):
47 if not isinstance(pd_series, pd.core.series.Series):
---> 48 raise TypeError(f"input must be a pandas series, not a {type(pd_series)}")
49 if hasattr(pd_series, "first_valid_index"):
50 first_valid_index = pd_series.first_valid_index()
TypeError: input must be a pandas series, not a <class 'pandas.core.frame.DataFrame'>
Not really following here, it's telling me input must be a pandas series but I thought Alteryx.write() required a pandas dataframe, which is exaclty what df is? If anyone can point me in the right direction it would be much appreciated.