Alteryx Designer Desktop Discussions

rohit782192 · ‎03-10-2020

Hello Team,

Do we have any proper python code that extract pdf file and write in Alteryx Database and do a compariosn.

OllieClarke · ‎03-10-2020

Hi @rohit782192 it's not python, (although you could definitely do it in python), but I made an R based macro which reads PDFs into Alteryx as text. You can find it here: https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b

You'll also need to install the R package 'pdftools' for the macro to work

rohit782192 · ‎03-10-2020

In normal Jupyter in Anaconda i can do the same in Python.

Using Tabula or camleton.

OllieClarke · ‎03-10-2020

If you have python code that is extracting pdfs already, then you can use it in the Python tool. You can write data out of the Python tool using the Alteryx.write() function. From there you can output to .yxdb using an output tool

sonseeahray · ‎03-16-2020

Hi @OllieClarke,

I am just looking into the capability of reading pdfs by Alteryx and had found topics on using python code. One of the discussions mentioned the need to install additional software. Is this required or, if I have the python code, can I run Alteryx alone?

This is the discussion that mentions needing additional software installed:

Extracting Tabular Data from PDF Documents with Python Code Tool

Thanks

rohit782192 · ‎03-16-2020

Hi,

I have done outside of Alteryx in Jupyter notebook using tabula the conversion of pdf to excel and comparison in alteryx.

It work for me.

sonseeahray · ‎03-16-2020

Hi @rohit782192, I am hoping to work within Alteryx only, but it's good to know other options. What is tabula?

rohit782192 · ‎03-16-2020

*tabula*-py is a simple *Python* wrapper of *tabula*-java, which can read
table of PDF. You can read tables from PDF and convert into pandas's
DataFrame. *tabula*-py also enables you to convert a PDF file into
CSV/TSV/JSON file.
Thanks and Regards
Rohit Gupta.

sonseeahray · ‎03-18-2020

@rohit782192 Hi Rohit. Is there a way to automate this?

AbhilashR · ‎03-18-2020

@sonseeahray - attached is a sample implementation of the Python tabula package for your reference. Modify the pdf file location in the text input and run the workflow. This code will install tabula package and extract table data from your pdf.

You will have to run Alteryx as an Administrator to be able to install Python packages. Make sure to comment out this line (Package.installPackages(['pandas','numpy','tabula'])) after you run it the first time.

Hope this helps.

Alteryx Designer Desktop Discussions

Python Code for Comparison in Alteryx.

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...