Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Multi Format PDF Parsing


Hi All,


I'm working on a parsing project involving PDFs of multiple pages, formats, and table structures. The RegEx expressions have been a big help, however, due to the varying structures of text and numerical tables, the expressions are not perfectly reliable, yet.


Many thanks to Chad, for his post, "Can Alteryx Parse a Word Doc or PDF?", found below. His workflow using the doctotext.exe gave me a solid foundation to begin this project.


Attached, I'm including a sample of PDFs I'm working with, as well as the modified workflow. Ideally, I'd like to be able to isolate the "Investments" table, without the need for an external parser, such as Tabula,


Thank you for your time, and I greatly appreciate any insight or suggestions!




Also including a packaged workflow.