Alteryx Designer Desktop Discussions

nic_hamley · ‎02-03-2020

Hi there,

I've input a PDF using the PDF Input macro from the Gallery (which is really good). However, I'm trying to use RegEx to parse the data out into columns and I'm struggling. Basically, the data looks a little like this:

Year -1 A/C Title Quantity Debit Credit

(17,673.24) 101 Regular Fees 113,101.01

(24,368,757.31) 102 Local - Sub-Contractor Fees 36,281,323.80

204 Contractors' remuneration

3,853.35 212 Cleaning 3,934.79

I want to parse it out into 6 columns (Year -1, A/C, Title, Quantity, Debit and Credit), however, not every column contains data. Furthermore, the Title column in the PDF contains some special characters ($-'~/() and even numbers). Can anyone help with the RegEx? I've got it close (using regexr.com to build the expression), however it's not quite working properly and when I try inputting it into Alteryx, it's giving me some errors.

Expression used: (\d\d\d)+(\s{2,})+([\d\-a-zA-Z'/()&~$]*\s{1}[\d\-A-Za-z'/()&~$]*)+(?:\s{1}[\d\-A-Za-z'/()&~$]*)?

Any help would be greatly appreciated!

fmvizcaino · ‎02-03-2020

Hi @nic_hamley ,

Would you be able to share a txt file with a lot more rows to check all possible configurations, please?

That way, I can try to build a regex that matches every single line.

Best,

Fernando Vizcaino

DavidP · ‎02-04-2020

My 1st suggestion would be to split the column headers from the data, much easier to parse that way. Once done, you can recombine them with a union tool.

I'll play around more with the 2nd part when I get a chance, but this should get you going.

DavidP · ‎02-04-2020

ok, here's my best effort. My regex can parse out the columns, except for putting debit and credit in the correct columns. Also, it looks like there's no quantity field in the data.

Alteryx Designer Desktop Discussions

RegEx help - parsing out PDF input data

No data in configuration pane of tool immediately ...

Re: How to separate IDs from the text

Re: Iterative canvas execution with different file...

Re: Need Help in Data manipulation

Re: Adding Total Row to Table