Hi Alteryx Experts,
I've been working on extracting data from a few PDFs into Alteryx, but the extracted data is misaligned across multiple columns. I’ve tried multiple methods, including Transpose, Cross Tab, Multi-Row Formula, and Text to Columns, but I haven’t been able to get the expected result.
Challenges I am Facing:
What I Have Tried So Far:
Request for Help
Can anyone guide me on the best approach to correctly format this extracted data in Alteryx? Would really appreciate any suggestions, workflows, or logic to apply!
Thanks in advance!
Solved! Go to Solution.
Hi @buddhiDB!
WIth the way that it's structured right now I find it hard that you'll be able to automate it. Mostly because there's no standard position for the "1st Inst." figures.
For "Aaron John Kedzlie - 2023 - IR3", the values for 2024 provisional tax, 2024 tax pooling and amounts due are not under "1st Inst." They are under column 4, which has no header.
If you told me that it's always the case for the first file name, for example, there could be a way. But this happens again with "Mt Roskill Cash 'N Carry Ltd - 2023 - IR4" after skipping "Joanna Maree Kedzlie - 2023 - IR3" and "Kedzlie Home Trust - 2023 - IR6".
I would try a different method of extracting the data from the PDF, if possible.
Hey @buddhiDB
Here's an approach which I've tried to make as flexible as possible
However, with instances like these, it's often not worth trying to be flexible. If these files only need to be read in once, then it's often quicker to just work out the alignment and use a lookup to align the headers.
The approach I took here, is to find the values in a column with no headers, and then shift their column +/- 1. Then find which shift ended up with the most values aligned, and use that shift to update the column.
It's not perfect, as if a value is sat between a column which should be null and a column which should have the value, then there's no way of knowing which column it should be in. At least without further business context.
Anyway, hope that helps,
Ollie