Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx help - parsing out PDF input data

nic_hamley
7 - Meteor

Hi there,

 

I've input a PDF using the PDF Input macro from the Gallery (which is really good). However, I'm trying to use RegEx to parse the data out into columns and I'm struggling. Basically, the data looks a little like this:

 

Year -1        A/C  Title                                             Quantity         Debit             Credit

(17,673.24) 101    Regular Fees                                                                            113,101.01

(24,368,757.31) 102     Local - Sub-Contractor Fees                    36,281,323.80

204    Contractors' remuneration

3,853.35 212    Cleaning                                                            3,934.79

 

I want to parse it out into 6 columns (Year -1, A/C, Title, Quantity, Debit and Credit), however, not every column contains data. Furthermore, the Title column in the PDF contains some special characters ($-'~/() and even numbers). Can anyone help with the RegEx? I've got it close (using regexr.com to build the expression), however it's not quite working properly and when I try inputting it into Alteryx, it's giving me some errors.

Expression used: (\d\d\d)+(\s{2,})+([\d\-a-zA-Z'/()&~$]*\s{1}[\d\-A-Za-z'/()&~$]*)+(?:\s{1}[\d\-A-Za-z'/()&~$]*)?

 

Any help would be greatly appreciated!

 

 

3 REPLIES 3
fmvizcaino
17 - Castor
17 - Castor

Hi @nic_hamley ,

 

Would you be able to share a txt file with a lot more rows to check all possible configurations, please?

That way, I can try to build a regex that matches every single line.

 

Best,

Fernando Vizcaino

DavidP
17 - Castor
17 - Castor

My 1st suggestion would be to split the column headers from the data, much easier to parse that way. Once done, you can recombine them with a union tool.

 

I'll play around more with the 2nd part when I get a chance, but this should get you going.

 

pdf regex.png

DavidP
17 - Castor
17 - Castor

ok, here's my best effort. My regex can parse out the columns, except for putting debit and credit in the correct columns. Also, it looks like there's no quantity field in the data.

 

pdf regex.png

Labels