Alteryx Designer Desktop Discussions

Cfdiaz2103 · ‎06-24-2022

Hello team,

Currently I'm working on a workflow that should be able to identify and split to columns an invoice number among tons on information (Text and numbers). I know that it could be done with a Regex Tool, but I have not been able to identify a pattern to tokenize this specific data.

Here, I'm sharing with you a sample of the current data and the desired output. Note that the invoice number I would like to split is colored in red.

I'd appreciate any kind of support you can give me.

Best wishes.

IraWatt · ‎06-24-2022

Hey @Cfdiaz2103,

I don't know if what logic you would apply here:

"ANDRES RODRIGUEZ FORERO FC 1777, ARRIENDO OFICINA 31 Y PARQUEADERO MARZO 2022;"

How can we tell 2022 is not an invoice number?

Cfdiaz2103 · ‎06-24-2022

Hello @IraWatt!

Well, I do believe that the invoice number is preceded by a prefix such as FAC, FC, LIQ or N°. If it's possible to extract first the invoice number alongside its corresponding prefix, and then by trimming the prefix, it would be great, but I've got no idea how I could do that.

Thanks!

DataNath · ‎06-24-2022

@Cfdiaz2103 if that's an extensive list of prefixes, then this will work. It's also dynamic so it doesn't matter how many invoice numbers are in the column as, if you use the tokenize method and split to columns, you need to specify to how many. If this number can vary then you'll end up losing data (e.g. you have your tokenize set to 4 columns and there's 9 invoice numbers in a field, it would drop all after the 4th).

If you have more possible prefixes, just add them to the (?:FAC|FC|LIQ|N°) part of the expression in the RegEx with a '|' before them. For example, if you needed to add 2 more prefixes of ABC and XYZ, this would become (?:FAC|FC|LIQ|N°|ABC|XYZ):