Calling all Racers for the Alteryx Grand Prix! It's time to rev your engines and race to the stage at Inspire! Sign up here.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.

Extract Data from PDF

8 - Asteroid

Hi all -


Not sure if this can even be done. However....


I have a .pdf document with house builder data. From page 55 is a list of all house builders and their contact details. What I would like to do is extract their company name (in the blue bar) and return their UK full postcode.


Any help would be greatly appreciated.



12 - Quasar

Hi @RDF25087 


There is a macro here - but there are a few pre-requisites before you can run it.


Another option is if you have the Intelligence Suite Licence - which has extraction from PDF capabilities

8 - Asteroid

Hi @DavidSkaife 


Thank you for the quick reply. I doesn't look like we have the Intelligence Suite License - so I'll take a crack at the macro solution first.



8 - Asteroid

That macro looks like it uses R to parse the PDF. IMO R does not do a good job. I've worked with PDF files a number of times in production over the past couple of decades and the best free solution that I've found is the Xpdf command-line tools. These are no-frills exe files, and the results are better than R. There are multiple parsing options (check the --help switch). I've always had the best results using -layout and -table switches depending on how the document is formatted. Once converted, one would use regex and logic to parse and ensure no data is lost in conversion.


These are command-line paramaters, so remember to use "quotes" if there is a space in the path or file name. Test in the shell before putting into Alteryx.

pdftotext.exe -layout file.pdf file.txt


pdftotext.exe -table file.pdf file.txt


To use with Alteryx, you'd just set up the run tool and read in the text results using either the run tool itself, or the blob tool in special cases if the results are mangled by the Alteryx tool. I have this deployed on the gallery at work.