Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF Extract

briannet
6 - Meteoroid

Hello!

 

I have a use case where I want to extract information from PDF files. The PDFs are all in the same format, however, depending on how much information is included, the information I need to extract could be in slightly different locations throughout the PDF. I have access to Intelligence Suite. Does anyone have any suggestions on ways to accommodate these differences?

3 REPLIES 3
cmcclellan
14 - Magnetar

Regex is probably the best way, it depends on the format of the document.

 

I did a large project last year (we used Python, not Alteryx) but converted every PDF to text (using code) and then went through with a series of regex's to get the information we wanted and then processed that further.

briannet
6 - Meteoroid

Thank you for your response! Have you used the python tools in Alteryx?

mceleavey
17 - Castor
17 - Castor

Hi @briannet ,

 

If you can provide an example of the pdf I can show you how this is parsed using the IS tools.

Depending on what you're trying to achieve there could be different approaches.

 

M.



Bulien

Labels
Top Solution Authors