Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

New PDF/Data mining tools

JLMToth
7 - Meteor

Hello,

 

I am attempting to work with the PDF tools that recently came out, ive been having great luck with the reports I have where the format is standard/one page. But the ones I am working on at the moment are a crystal reporting (blech) and come out in different lengths and positions every month from a third part who is incapable of providing any other format. Is there a way to get the new tool to just read the whole thing and I can delimitate it after that fact. That was how we did it with a home grown macro-- but it only works on my old version of alteryx and I really hate having to keep a second version to run ONE thing. I would really like to update this. As it stands though the only place I can get is pulling in the file names from the directory. without an image template I cant get any farther, and doing the template as one box on the first page only gives me a file for each page with no data on any of them

5 REPLIES 5
danilang
19 - Altair
19 - Altair

Hi @JLMToth 

 

Is there any chance that you could share one of these input files?  We need to examine the format to see what the best solution would be.

 

If the reports don't need to be OCRed, i.e. they're straight text wrapped in a pdf,  you may want to consider using the Information Lab's PDF Input tool.  Unlike the Alteryx tool, this one doesn't require any messing around with input areas.  It just takes the pdf and extracts all the text from it.  Once you have the text, you can use the standard data prep methods to get the various sections and tables. 

 

There's also a very good video here on how to install the .xyi

 

Dan

EricaR
Alteryx
Alteryx

Yes.  You don't have to use the template.  You can use the pdf tools, do not annotate anything or add a template to the bottom input.  This will read everything in, and you can use regex or other parsing methods as needed from that point. 

 

Make sure you add a browse tool. It often looks like nothing is being brought in, but the data is there.  You might want to experiment with a regex to bring you to the first piece of data that you KNOW you should see to determine how to parse out the empty spaces that are appearing at the beginning of the data pull. 

JLMToth
7 - Meteor

i sadly cannot share it as it is 90% check numbers and personal info. and i used to use a PDF tool like the one you mention to read it in as straight text--- but when i updated my version of Alteryx it stopped working.

JLMToth
7 - Meteor

I tried running it without the template... it ran for over 12 hours but did not progress at all before i finally just clicked stop.

echuong1
Alteryx Alumni (Retired)

What PDF tool did you use to read in the document as straight text, and which version of Designer are you on? 

 

There were some updates to the version of Python being used in newer versions of Designer, so that may be the cause of the tool no longer working. It likely needs to be updated to accommodate.

 

Try using this tool, downloadable from the public gallery. It uses R in the backend (be sure to read through the documentation, since you will likely need to install a package), so you should be able to get around the aforementioned issue. Hope this helps!

https://gallery.alteryx.com/#!app/PDF-Input/5b685aff0462d710907f7a3b 

Labels