community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

Multi Format PDF Parsing

Meteoroid

Hi All,

 

I'm working on a parsing project involving PDFs of multiple pages, formats, and table structures. The RegEx expressions have been a big help, however, due to the varying structures of text and numerical tables, the expressions are not perfectly reliable, yet.

 

Many thanks to Chad, for his post, "Can Alteryx Parse a Word Doc or PDF?", found below. His workflow using the doctotext.exe gave me a solid foundation to begin this project.

 

http://community.alteryx.com/t5/Alteryx-Knowledge-Base/Can-Alteryx-Parse-A-Word-Doc-Or-PDF/ta-p/1156

 

Attached, I'm including a sample of PDFs I'm working with, as well as the modified workflow. Ideally, I'd like to be able to isolate the "Investments" table, without the need for an external parser, such as Tabula, http://tabula.technology/.

 

Thank you for your time, and I greatly appreciate any insight or suggestions!

 

Andrew

Meteoroid

Also including a packaged workflow.

 

Thanks!

 

Andrew

Labels