Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF Extract

briannet
6 - Meteoroid

Hello!

 

I have a use case where I want to extract information from PDF files. The PDFs are all in the same format, however, depending on how much information is included, the information I need to extract could be in slightly different locations throughout the PDF. I have access to Intelligence Suite. Does anyone have any suggestions on ways to accommodate these differences?

3 REPLIES 3
cmcclellan
13 - Pulsar

Regex is probably the best way, it depends on the format of the document.

 

I did a large project last year (we used Python, not Alteryx) but converted every PDF to text (using code) and then went through with a series of regex's to get the information we wanted and then processed that further.

briannet
6 - Meteoroid

Thank you for your response! Have you used the python tools in Alteryx?

mceleavey
17 - Castor
17 - Castor

Hi @briannet ,

 

If you can provide an example of the pdf I can show you how this is parsed using the IS tools.

Depending on what you're trying to achieve there could be different approaches.

 

M.



Bulien

Labels