Hello Team,
Do we have any proper python code that extract pdf file and write in Alteryx Database and do a compariosn.
Solved! Go to Solution.
@AbhilashR Thank you very much. I will look at this now.
@AbhilashR Would you step me though installing the Python packages?
@sonseeahray - assuming you are running alteryx as an admin in your local machine, this line of code should install the packages on your machine:
Package.installPackages(['pandas','tabula'])
@sonseeahray - could you paste the whole error msg. ?
@AbhilashR Yes, here you go and thank you!
@sonseeahray - I am unable to explain what is causing the error. Can you unzip the attached solution and run it on your machine? Keep in mind, Tabula is best suited if your pdf has tabular data. If you are looking to extract plain text, you could use the wonderful solution put together by @OllieClarke.
@AbhilashR okay, I'll give it a try. thanks!
Hello,
I was hopeful this would work for me but got the following errors. Not sure how to proceed.
Thanks,
Neil
SUCCESS: reading input data "#1" [Datafile.writeData]: metadata arg is required for yxdb and expected to be dict like {'Field1': {'type': 'FixedDecimal', 'length': (8, 3), 'source': 'PythonTool:', 'description': 'my description'}, 'Field2': {...}} Error: unable to write output (C:\Users\username\AppData\Local\Temp\Engine_5940_7bdf89a872a5430a89b0fc118f2fa074_\4bb7e0f98f3e2202be5e41703d97d9ec\output_1.yxdb) ERROR: writing outgoing connection data 1
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-7398b652318e> in <module> 19 output_data = output_data.append(pandas.DataFrame.from_records(data[a])) 20 ---> 21 Alteryx.write(output_data,1) c:\users\username\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\export.py in write(pandas_df, outgoing_connection_number, columns, debug, **kwargs) 85 When running the workflow in Alteryx, this function will convert a pandas data frame to an Alteryx data stream and pass it out through one of the tool's five output anchors. When called from the Jupyter notebook interactively, it will display a preview of the pandas dataframe. An optional 'columns' argument allows column metadata to specify the field type, length, and name of columns in the output data stream. 86 """ ---> 87 return __CachedData__(debug=debug).write( 88 pandas_df, outgoing_connection_number, columns=columns, **kwargs 89 ) c:\users\username\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\CachedData.py in write(self, pandas_df, outgoing_connection_number, columns, output_filepath) 639 try: 640 # get the data from the sql db (if only one table exists, no need to specify the table name) --> 641 data = db.writeData(pandas_df_out, metadata=write_metadata) 642 # print success message 643 if outgoing_connection_number is not None: c:\users\username\appdata\local\alteryx\bin\miniconda3\envs\designerbasetools_venv\lib\site-packages\ayx\Datafiles.py in writeData(self, pandas_df, metadata) 731 ) 732 print(error_msg) --> 733 raise TypeError(error_msg) 734 elif len(metadata) != len(pandas_df.columns): 735 error_msg = msg_prefix.format( TypeError: [Datafile.writeData]: metadata arg is required for yxdb and expected to be dict like {'Field1': {'type': 'FixedDecimal', 'length': (8, 3), 'source': 'PythonTool:', 'description': 'my description'}, 'Field2': {...}}