Hello All,
Hi @Learner09
To answer your question, YES Alteryx can read the PDFs and extract text from them
There is already an workflow build and kept for everyone's use. Link is below.
Please mark helpful answers as a solution so that future users with the same problem can find them more easily!!!!
Many thanks
Shanker V
@ShankerV , thank you for sharing, but it is Showing run bat error
Hi @Learner09,
@ShankerV has provided one way of reading in PDFs and extracting text from it, with just the base Designer platform.
Another way would be to use the Intelligence Suite add-on to extract the text from a PDF using the Image Input tool (https://help.alteryx.com/20223/designer/image-input). This can extract paragraphs, tables, and images from PDFs. Furthermore, Intelligence Suite also provides the Text Mining and Assisted Modelling tool categories to supercharge your analytics.
Attached some screenshots, if there's interest the intelligence suite trial can be downloaded here (https://www.alteryx.com/intelligence-suite-trial/intelligence-suite-trial) and the Intelligence Suite starter kit can be downloaded here (https://www.alteryx.com/starter-kit/intelligence-suite).
And as Shanker mentioned, do mark helpful answers as solutions (and you can mark more than 1 reply as a solution). Hope this helps!
Best,
HS
@Hongsen_T Thank you for sharing this, however, the Intelligence Suite I believe paid to add on, after 30 days, I have to buy and this is a bit difficult for me.
@Hongsen_T and @ShankerV is there any other way to extract PDF to Text?
Hey @Learner09
If you are not planning to leverage the intelligence suite, a python solution will probably be the easier to implement for you.
There are many open-source packages you can use to read pdfs... and tables within pdfs...so what you end up using will also be a result of the type of document you need to parse.
Some of the most common packages that I've used in the past with a high rate of success are: camelot, pandas and PyPDF2
These are very well documented both with use cases in the Alteryx community and other... so you should have plenty of resources to pull from.
 
					
				
				
			
		
