Hi there,
I've input a PDF using the PDF Input macro from the Gallery (which is really good). However, I'm trying to use RegEx to parse the data out into columns and I'm struggling. Basically, the data looks a little like this:
Year -1 A/C Title Quantity Debit Credit
(17,673.24) 101 Regular Fees 113,101.01
(24,368,757.31) 102 Local - Sub-Contractor Fees 36,281,323.80
204 Contractors' remuneration
3,853.35 212 Cleaning 3,934.79
I want to parse it out into 6 columns (Year -1, A/C, Title, Quantity, Debit and Credit), however, not every column contains data. Furthermore, the Title column in the PDF contains some special characters ($-'~/() and even numbers). Can anyone help with the RegEx? I've got it close (using regexr.com to build the expression), however it's not quite working properly and when I try inputting it into Alteryx, it's giving me some errors.
Expression used: (\d\d\d)+(\s{2,})+([\d\-a-zA-Z'/()&~$]*\s{1}[\d\-A-Za-z'/()&~$]*)+(?:\s{1}[\d\-A-Za-z'/()&~$]*)?
Any help would be greatly appreciated!
Solved! Go to Solution.
Hi @nic_hamley ,
Would you be able to share a txt file with a lot more rows to check all possible configurations, please?
That way, I can try to build a regex that matches every single line.
Best,
Fernando Vizcaino