Alteryx Designer Desktop Discussions

briannet · ‎11-18-2021

Hello!

I have a use case where I want to extract information from PDF files. The PDFs are all in the same format, however, depending on how much information is included, the information I need to extract could be in slightly different locations throughout the PDF. I have access to Intelligence Suite. Does anyone have any suggestions on ways to accommodate these differences?

cmcclellan · ‎11-22-2021

Regex is probably the best way, it depends on the format of the document.

I did a large project last year (we used Python, not Alteryx) but converted every PDF to text (using code) and then went through with a series of regex's to get the information we wanted and then processed that further.

briannet · ‎12-01-2021

Thank you for your response! Have you used the python tools in Alteryx?

mceleavey · ‎12-01-2021

Hi @briannet ,

If you can provide an example of the pdf I can show you how this is parsed using the IS tools.

Depending on what you're trying to achieve there could be different approaches.

M.

Alteryx Designer Desktop Discussions

PDF Extract

Re: Macro not Looping thru Files in Folder

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data