This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
If you have python code that is extracting pdfs already, then you can use it in the Python tool. You can write data out of the Python tool using the Alteryx.write() function. From there you can output to .yxdb using an output tool
I am just looking into the capability of reading pdfs by Alteryx and had found topics on using python code. One of the discussions mentioned the need to install additional software. Is this required or, if I have the python code, can I run Alteryx alone?
This is the discussion that mentions needing additional software installed:
*tabula*-py is a simple *Python* wrapper of *tabula*-java, which can read table of PDF. You can read tables from PDF and convert into pandas's DataFrame. *tabula*-py also enables you to convert a PDF file into CSV/TSV/JSON file. Thanks and Regards Rohit Gupta.
@sonseeahray - attached is a sample implementation of the Python tabula package for your reference. Modify the pdf file location in the text input and run the workflow. This code will install tabula package and extract table data from your pdf.
You will have to run Alteryx as an Administrator to be able to install Python packages. Make sure to comment out this line (Package.installPackages(['pandas','numpy','tabula'])) after you run it the first time.