This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hi, I'm an Alteryx beginner and need some serious help with a specific, very large dataset. This project is the primary reason for the license purchase. Can someone help build a workflow that converts the attached job costing data in PDF to columnar (or is it tabular?) format for further analysis? The data is from the COINS ERP system. Attached is a sample of the data for ONE project in ONE year for ONE company along with the output format I would like. I need to build a template workflow that will allow me to convert this same type of data for thousands of projects spanning seven years for 40+ entities. PDFs are currently separated by year, by entity (so roughly 250-300 separate, large PDF files). Once the data is properly converted I will need to apply various lookups and blend it with 2-3 other datasets for various financial/computational analyses and reporting. I'm much more comfortable with these tasks, just need this core data in a workable format.
From my research, it looks like I'll need to use another source such as DoctToText, R code, etc. which I have no experience with. I will be spinning my wheels for days. Please help.
Thanks in advance to the brave soul who takes this one. I'm at your disposable to get this solved!!!
A colleague of mine has recently published a 'PDF Input' connector which as you stated, makes use of the R tool.
You will then have to perform parsing (take a look at the regex and text to columns tools for this). My colleague also included a sample workflow in the documentation so it's worth looking at how he converted the PDF into a structured table.