Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Reformatting a text file converted from PDF

JVM15
7 - Meteor

Hello,

 

I have PDF of a clients return from a state website, I was able to convert the PDF to Text but that conversion placed all the different rows and columns into one field. My skills in Alteryx are not to this level of problem solving. Does anyone know where I can start? Ideally I'd like to get 1 table where I can get each location codes information and tax type to do a quick analysis by location. Right now this is a time consuming manual entry job the copies attached below represent one months data.

 

If this is not possible let me know.

 

I attached one of the PDF and the sample converted file in xlsx.

 

Thanks,

5 REPLIES 5
TrevorS
Alteryx Alumni (Retired)

Hello @JVM15 


Are you needing this data in a .txt format?
Are you manually converting the data into excel currently?


The below post shows how you can convert a PDF into an excel document through Alteryx.
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/read-multiple-PDF-files-in-Alteryx-and...
The resources in this post also may be helpful for you!
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/PDF-as-Data-source/td-p/158064

I hope this gets you going in the right direction!
TrevorS

Community Moderator
JVM15
7 - Meteor

Hi @TrevorS 

 

Thanks for the response.

 

Can you elaborate what you mean by 'do you need the data in .txt format?

The data I sent in the excel file is the output from a PDF to CSV macro. 

 

The manual part I mentioned is taking the data of each jurisdiction and adding them up to get total tax by jurisdiction to assist with reconciliation. Each Jurisdiction can appear more than once in each location page. This is our current process.

 

Prior to my ask, I had reviewed both your sources attached. While both would solve my problem, I can't download the Macro tool because our Organization is on an older version of Alteryx and I can't Install the software in the second source as its not an approved software.

 

Any other recommendations?

 

Thanks,

TrevorS
Alteryx Alumni (Retired)

Hello @JVM15 

I was looking into this further and realized that how your PFT was being converted is part of the problem.

I would recommend looking into a different PDF to CSV conversion tool such as Zamzar.

 

I re-converted your original PDF which gave me a much cleaner set of data to work with. (Attached and saved as an .xlsx document)

The attached workflow shows what I did with your CSV file. You can use this format, but it will take a considerable amount of Regex parsing and text to column work to complete.


I would recommend checking out these resources below as well.
Parsing Interactive lesson

Tool Mastery: Regex Tool
Tool Mastery: Text to Column

 

I hope this helps to get you started!

Community Moderator
JVM15
7 - Meteor

@TrevorS 

 

The PDF conversion you made is definitely more manageable. I can work with something like this.

 

Thanks so much for your time and assistance with this! You've saved me a lot of time!

TrevorS
Alteryx Alumni (Retired)

@JVM15 

I'm happy to hear it! 

Good luck, and don't hesitate to reach out if you need any more assistance!

Community Moderator
Labels