Alteryx Designer Desktop Discussions

mb1824 · ‎05-16-2019

Hi,

I use the PDF Parser at http://silvercoders.com/en/products/doctotext/ to convert PDF files to text files. Most of the time it works really well.

However, it doesn't do such a good job on one particular file that I receive every month. It contains 4 pages with 6 tables on each page that each use values from the same fields for rows/columns & amounts.

I think (not 100% sure) the file is generated by Cognos/TM1. The doctotext converter works as per normal, but it is impossible to use normal Alteryx tools (I mostly use REG_EX / Multi-Row formula / filter) to extract the data within. The rows/columns labels & amounts are spread all over the place and there are no repeated patterns to work with.

I can export the PDF to XLSX using the converter within Adobe Reader (I have paid to have Adobe Export PDF), but I am unaware of how to make that happen within Alteryx and I am trying to avoid the manual process step of doing something outside Alteryx

I have asked many times for the file to be sent as XLSX or CSV and eventually gave up

Do you have any ideas?