Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Don't forget to submit your entry for the Excellence Awards by October 30! | Need more information about the program? Check out the blog here

PDF convert

Highlighted
5 - Atom

Is there a tool in Alteryx to convert a large number of PDFs to .txt in bulk?

Highlighted
Alteryx
Alteryx

Hi - This set of tools is called Text Mining and is a part of the Intelligence Suite. You can read more here:

 

https://www.alteryx.com/products/alteryx-platform/intelligence-suite

https://help.alteryx.com/2020.2/ToolCategories/TextMining.htm

 

ArtApa_0-1594248363157.png

 

Highlighted
Alteryx
Alteryx

The new Text Mining tools mentioned above work great if you have a file that you want to pull very specific pieces of information from - think data coming from an invoice or a form. They are an additional package you'd need to purchase, but work amazing for these situations. 

 

If you need to pull data from large tables in PDF that span multiple pages, I suggest using the older PDF tool. It can be found here:

https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b

 

Hope this helps!

Highlighted
Alteryx Partner

Hello @Graceyahiro ,

 

Is it a PDF with a large number of pages or a large number of PDFs?

Highlighted
9 - Comet

@Graceyahiro - I am on an older version of Alteryx and also Intelligent Suite in 2020.2 comes with a cost. I use R library (Pdftools) or Tesseract to parse my PDFs....... when I have multiple PDF files, I have created a macro that would do the job for me. Below is an article that you may find interesting and has some further links to read:

https://community.alteryx.com/t5/Alteryx-Designer/PDF-Parsing-in-Alteryx-using-R/ta-p/82627

Labels