Hello!
I have a use case where I want to extract information from PDF files. The PDFs are all in the same format, however, depending on how much information is included, the information I need to extract could be in slightly different locations throughout the PDF. I have access to Intelligence Suite. Does anyone have any suggestions on ways to accommodate these differences?
Regex is probably the best way, it depends on the format of the document.
I did a large project last year (we used Python, not Alteryx) but converted every PDF to text (using code) and then went through with a series of regex's to get the information we wanted and then processed that further.
Thank you for your response! Have you used the python tools in Alteryx?
Hi @briannet ,
If you can provide an example of the pdf I can show you how this is parsed using the IS tools.
Depending on what you're trying to achieve there could be different approaches.
M.