Python Code for Comparison in Alteryx.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello Team,
Do we have any proper python code that extract pdf file and write in Alteryx Database and do a compariosn.
Solved! Go to Solution.
- Labels:
- Adobe
- Alias Manager
- Amazon S3
- API
- Apps
- Bug
- Calgary
- CASS
- Chained App
- Common Use Cases
- Connectors
- Database Connection
- Datasets
- Demographic Analysis
- Documentation
- Download
- Expression
- Help
- In Database
- Input
- Installation
- Interface Tools
- Iterative Macro
- Licensing
- Location Optimizer
- Macros
- Marketo
- MongoDB
- Optimization
- Parse
- Power BI
- Predictive Analysis
- Prescriptive Analytics
- Publish
- Python
- R Tool
- Regex
- Reporting
- Run Command
- Scheduler
- Server
- Settings
- Sharepoint
- Spatial Analysis
- Tableau
- Time Series
- Tips and Tricks
- Transformation
- Udacity
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @rohit782192 it's not python, (although you could definitely do it in python), but I made an R based macro which reads PDFs into Alteryx as text. You can find it here: https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b
You'll also need to install the R package 'pdftools' for the macro to work
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
In normal Jupyter in Anaconda i can do the same in Python.
Using Tabula or camleton.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If you have python code that is extracting pdfs already, then you can use it in the Python tool. You can write data out of the Python tool using the Alteryx.write() function. From there you can output to .yxdb using an output tool
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @OllieClarke,
I am just looking into the capability of reading pdfs by Alteryx and had found topics on using python code. One of the discussions mentioned the need to install additional software. Is this required or, if I have the python code, can I run Alteryx alone?
This is the discussion that mentions needing additional software installed:
Extracting Tabular Data from PDF Documents with Python Code Tool
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I have done outside of Alteryx in Jupyter notebook using tabula the conversion of pdf to excel and comparison in alteryx.
It work for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @rohit782192, I am hoping to work within Alteryx only, but it's good to know other options. What is tabula?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
table of PDF. You can read tables from PDF and convert into pandas's
DataFrame. *tabula*-py also enables you to convert a PDF file into
CSV/TSV/JSON file.
Thanks and Regards
Rohit Gupta.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@rohit782192 Hi Rohit. Is there a way to automate this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@sonseeahray - attached is a sample implementation of the Python tabula package for your reference. Modify the pdf file location in the text input and run the workflow. This code will install tabula package and extract table data from your pdf.
You will have to run Alteryx as an Administrator to be able to install Python packages. Make sure to comment out this line (Package.installPackages(['pandas','numpy','tabula'])) after you run it the first time.
Hope this helps.
