I have a dataset which is similar to the sample dataset attached.
My original dataset is such that I have used Python tool to manipulate it. One of the final steps in it is to convert multiple long format columns into wide format. After this I need to extract this output in an excel file. In order to achieve this, I have written "Alteryx.write(df,1)" but it gives me the error as mentioned below:
Error: Python (2): ---------------------------------------------------------------------------
ReferenceError Traceback (most recent call last)
<ipython-input-7-1fa927026b68> in <module>
----> 1 Alteryx.write(df1,1)
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\export.py in write(pandas_df, outgoing_connection_number, columns, debug, **kwargs)
85 When running the workflow in Alteryx, this function will convert a pandas data frame to an Alteryx data stream and pass it out through one of the tool's five output anchors. When called from the Jupyter notebook interactively, it will display a preview of the pandas dataframe. An optional 'columns' argument allows column metadata to specify the field type, length, and name of columns in the output data stream.
86 """
---> 87 return __CachedData__(debug=debug).write(
88 pandas_df, outgoing_connection_number, columns=columns, **kwargs
89 )
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\CachedData.py in write(self, pandas_df, outgoing_connection_number, columns, output_filepath)
639 try:
640 # get the data from the sql db (if only one table exists, no need to specify the table name)
--> 641 data = db.writeData(pandas_df_out, metadata=write_metadata)
642 # print success message
643 if outgoing_connection_number is not None:
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Datafiles.py in writeData(self, pandas_df, metadata)
750 )
751 print(error_msg)
--> 752 raise ReferenceError(error_msg)
753
754 metadata_col = metadata[colname]
ReferenceError: [Datafile.writeData]: metadata is missing column contained in pandas_df: ('Hourly', 'IT', '2018')
I'm unable to rectify this error. Could someone please help me solve this error so that I can extract the output in an excel file? I have attached the sample input and workflow for reference. Thank you!
Solved! Go to Solution.
Hi @keeprollin ,
can you let us know what you're trying to achieve with the data?
I'm not sure why you would use Python to manipulate data as this is much slower than simply doing it in Alteryx.
Could you provide a sample output of the data that you require?
Thanks,
M.
Hi @keeprollin
I believe the issue is with the .pivot_table function that you're using in Python. This returns a dataframe with hierarchical indexes, i.e multple levels of column headers. Alteryx doesn't know what to do with these as it can only handle flat dataframes. The simplest, almost equivalent, workflow is this
The formula tool builds up the header and the crosstab tool widens the data using the new header field. The result looks like this
There are ways that you can take the compacted headers and split them out into dummy header rows, Type, Dept and Year, but these force the resulting columns in text format which can issue when output to excel.
Dan
User | Count |
---|---|
19 | |
15 | |
13 | |
9 | |
8 |