Convert PDF file to TXT
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I use the PDF Parser at http://silvercoders.com/en/products/doctotext/ to convert PDF files to text files. Most of the time it works really well.
However, it doesn't do such a good job on one particular file that I receive every month. It contains 4 pages with 6 tables on each page that each use values from the same fields for rows/columns & amounts.
I think (not 100% sure) the file is generated by Cognos/TM1. The doctotext converter works as per normal, but it is impossible to use normal Alteryx tools (I mostly use REG_EX / Multi-Row formula / filter) to extract the data within. The rows/columns labels & amounts are spread all over the place and there are no repeated patterns to work with.
I can export the PDF to XLSX using the converter within Adobe Reader (I have paid to have Adobe Export PDF), but I am unaware of how to make that happen within Alteryx and I am trying to avoid the manual process step of doing something outside Alteryx
I have asked many times for the file to be sent as XLSX or CSV and eventually gave up
Do you have any ideas?
Solved! Go to Solution.
- Labels:
- Input
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @mb1824
I used this great guide the other day by.... https://oliverpower.wordpress.com/2018/02/08/parsing-pdfs-using-alteryx-and-a-little-r/
It worked perfectly
Neil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks, I will try that out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
Did you find any solution for this? I am new to Alteryx and having same difficulty.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I haven't got back to this to try the suggestion from @LordNeilLord.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@mb1824 There are some tools on the gallery that you can use to parse pdfs.
https://gallery.alteryx.com/#!app/PDF-Input--Text-and-Image-/5be5ec8d0462d71ffce6deaa
https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b
Give them a try 🙂
Alteryx
