Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help with Correct regex

HW1
9 - Comet

I am trying to parse a pdf but I am unable to get all the records parsed as I want.

 

Apologies, am really weak at REGEX so I am not able to get exactly as I want.

 

I have included the expected output against the actual output from the pdf as attached. The missing fields are highlighted in yellow as per the xlsx file attached.

 

Help would be greatly appreciated.

 

Also, any tips/tutorials on how I can develop on my regex skills would be also of huge help.

 

Thanks.

 

4 REPLIES 4
Qiu
21 - Polaris
21 - Polaris

@HW1 

How come 

BP Warrego Highway is not an address 

while 
1505 Warrego Highway is one?

HW1
9 - Comet

@Qiu Yep, you are correct.
But, 'BP Warrego Highway' is the name of the site  and not the address. All addresses start with a number hence it is correct in not identifying it as an address.
Just like 'Med-X Clarence Correctional Centre' is the name of the site and not an address. The address is in the next row.

Can you help me with the correct regex in this case?

Qiu
21 - Polaris
21 - Polaris

@HW1 

Have something for you.

Please note that it is highly sensitive to your input.

0209-HW1.PNG

HW1
9 - Comet

@Qiu Thank you!

This one was not parsedThis one was not parsed
I got an understanding

(though a very faint one and to be honest, I get really anxious when I work with Regex. If you can point me on how to get the fear of working with it away, a tutorial or learning help, I will be ever so grateful!)

on how the regex tool works.

I always go on elaborate lengths to identify and resolve parsed PDFs as they are ever so changing in format. Almost every other pdf invoice is different even when it is from the same source.
As you rightly said, it is highly sensitive to the input and unfortunately the input changes every time; even every file.
But, can one build a generalised solution?

Labels