Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Python Code for Comparison in Alteryx.

Alteryx Certified Partner
Alteryx Certified Partner

Hi @rohit782192 it's not python, (although you could definitely do it in python), but I made an R based macro which reads PDFs into Alteryx as text. You can find it here: https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b

You'll also need to install the R package 'pdftools' for the macro to work

8 - Asteroid

In normal Jupyter in Anaconda i can do the same in Python.

Using Tabula or camleton.

Alteryx Certified Partner
Alteryx Certified Partner

If you have python code that is extracting pdfs already, then you can use it in the Python tool. You can write data out of the Python tool using the Alteryx.write() function. From there you can output to .yxdb using an output tool

OllieClarke_0-1583854618145.png

 

8 - Asteroid

Hi @OllieClarke,

I am just looking into the capability of reading pdfs by Alteryx and had found topics on using python code. One of the discussions mentioned the need to install additional software. Is this required or, if I have the python code, can I run Alteryx alone?

 

This is the discussion that mentions needing additional software installed:

Extracting Tabular Data from PDF Documents with Python Code Tool

 

Thanks

8 - Asteroid

Hi,

 

I have done outside of Alteryx in Jupyter notebook using tabula the conversion of pdf to excel and comparison in alteryx.

 

It work for me.

8 - Asteroid

Hi @rohit782192, I am hoping to work within Alteryx only, but it's good to know other options. What is tabula?

8 - Asteroid
*tabula*-py is a simple *Python* wrapper of *tabula*-java, which can read
table of PDF. You can read tables from PDF and convert into pandas's
DataFrame. *tabula*-py also enables you to convert a PDF file into
CSV/TSV/JSON file.
Thanks and Regards
Rohit Gupta.
8 - Asteroid

@rohit782192 Hi Rohit. Is there a way to automate this?

Alteryx Certified Partner
Alteryx Certified Partner

@sonseeahray - attached is a sample implementation of the Python tabula package for your reference. Modify the pdf file location in the text input and run the workflow. This code will install tabula package and extract table data from your pdf.

 

You will have to run Alteryx as an Administrator to be able to install Python packages. Make sure to comment out this line (Package.installPackages(['pandas','numpy','tabula'])) after you run it the first time.

 

Hope this helps. 

Labels