Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF extract Text

Learner09
8 - Asteroid

Hello All,

 

Can Alteryx read the PDFs and extract text from them? if Yes, could anyone draw a flow?
 
Thanks,
6 REPLIES 6
ShankerV
17 - Castor

Hi @Learner09 

 

To answer your question, YES Alteryx can read the PDFs and extract text from them

 

There is already an workflow build and kept for everyone's use. Link is below.

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Can-Alteryx-Parse-A-Word-Doc-Or-PDF...

 

 

Please mark helpful answers as a solution so that future users with the same problem can find them more easily!!!!

 

Many thanks

Shanker V

Learner09
8 - Asteroid

Mayank09_0-1672393588001.png

@ShankerV , thank you for sharing, but it is Showing run bat error

Hongsen_T
Alteryx
Alteryx

Hi @Learner09,

 

@ShankerV has provided one way of reading in PDFs and extracting text from it, with just the base Designer platform.

 

Another way would be to use the Intelligence Suite add-on to extract the text from a PDF using the Image Input tool (https://help.alteryx.com/20223/designer/image-input). This can extract paragraphs, tables, and images from PDFs. Furthermore, Intelligence Suite also provides the Text Mining and Assisted Modelling tool categories to supercharge your analytics.

 

Attached some screenshots, if there's interest the intelligence suite trial can be downloaded here (https://www.alteryx.com/intelligence-suite-trial/intelligence-suite-trial) and the Intelligence Suite starter kit can be downloaded here (https://www.alteryx.com/starter-kit/intelligence-suite).

 

And as Shanker mentioned, do mark helpful answers as solutions (and you can mark more than 1 reply as a solution). Hope this helps! 

 

Best,

HS

Teo Hong Sen
Sales Engineer
Alteryx
Learner09
8 - Asteroid

@Hongsen_T Thank you for sharing this, however, the Intelligence Suite I  believe paid to add on, after 30 days, I have to buy and this is a bit difficult for me.

Learner09
8 - Asteroid

@Hongsen_T and @ShankerV  is there any other way to extract PDF to Text?

gautiergodard
13 - Pulsar

Hey @Learner09 

If you are not planning to leverage the intelligence suite, a python solution will probably be the easier to implement for you.

There are many open-source packages you can use to read pdfs... and tables within pdfs...so what you end up using will also be a result of the type of document you need to parse. 

 

Some of the most common packages that I've used in the past with a high rate of success are: camelot, pandas and PyPDF2

 

These are very well documented both with use cases in the Alteryx community and other... so you should have plenty of resources to pull from.

Labels