Hello Team,
Do we have any proper python code that extract pdf file and write in Alteryx Database and do a compariosn.
Solved! Go to Solution.
Hi @rohit782192 it's not python, (although you could definitely do it in python), but I made an R based macro which reads PDFs into Alteryx as text. You can find it here: https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b
You'll also need to install the R package 'pdftools' for the macro to work
In normal Jupyter in Anaconda i can do the same in Python.
Using Tabula or camleton.
If you have python code that is extracting pdfs already, then you can use it in the Python tool. You can write data out of the Python tool using the Alteryx.write() function. From there you can output to .yxdb using an output tool
Hi @OllieClarke,
I am just looking into the capability of reading pdfs by Alteryx and had found topics on using python code. One of the discussions mentioned the need to install additional software. Is this required or, if I have the python code, can I run Alteryx alone?
This is the discussion that mentions needing additional software installed:
Extracting Tabular Data from PDF Documents with Python Code Tool
Thanks
Hi,
I have done outside of Alteryx in Jupyter notebook using tabula the conversion of pdf to excel and comparison in alteryx.
It work for me.
Hi @rohit782192, I am hoping to work within Alteryx only, but it's good to know other options. What is tabula?
@rohit782192 Hi Rohit. Is there a way to automate this?
@sonseeahray - attached is a sample implementation of the Python tabula package for your reference. Modify the pdf file location in the text input and run the workflow. This code will install tabula package and extract table data from your pdf.
You will have to run Alteryx as an Administrator to be able to install Python packages. Make sure to comment out this line (Package.installPackages(['pandas','numpy','tabula'])) after you run it the first time.
Hope this helps.