Alteryx Designer Desktop Discussions

briannet · ‎11-18-2021

Hello!

I have a use case where I want to extract information from PDF files. The PDFs are all in the same format, however, depending on how much information is included, the information I need to extract could be in slightly different locations throughout the PDF. I have access to Intelligence Suite. Does anyone have any suggestions on ways to accommodate these differences?

cmcclellan · ‎11-22-2021

Regex is probably the best way, it depends on the format of the document.

I did a large project last year (we used Python, not Alteryx) but converted every PDF to text (using code) and then went through with a series of regex's to get the information we wanted and then processed that further.

briannet · ‎12-01-2021

Thank you for your response! Have you used the python tools in Alteryx?

mceleavey · ‎12-01-2021

Hi @briannet ,

If you can provide an example of the pdf I can show you how this is parsed using the IS tools.

Depending on what you're trying to achieve there could be different approaches.

M.

Alteryx Designer Desktop Discussions

PDF Extract