I am using regex to handle the data. Currently I used "tokenize" to spot the parts I want to return but I found a problem. The problem is if I put some cases at the beginning and at the end will provide different results. Probably because the target field has properties of two cases but regex returned the first match only(less complicated case is returned). If I reverse the sequence of the cases, it works well. Does it mean there is a sequence on regex to execute the matching process? What should I write (now I use "|" to separate) such that I can include both cases without considering the sequence?
Plus I need to replace a field with new patterns, is it possible to output as a new column except using formula tool? Regex seems cannot work if there are some data with other pattern and can only replace the same field.
case 1: [[:alpha:]]{5,13}\d{4,7}|\w{13}-\w{1,3}
case 2: \w{13}-\w{1,3}|[[:alpha:]]{5,13}\d{4,7}
Thanks so much!
Solved! Go to Solution.
I think the problem is both of you expressions match the 6 example cases.
[[:alpha:]]{5,13}\d{4,7} will match the leading 5 letters and then the next 7 numbers.
The second \w{13}-\w{1,3} matches the first 13 letters or numbers a dash and then 1-3 letters of numbers
As you are matching the entire string I suggest you add a ^ and $ so that it knows to match whole string:
^[[:alpha:]]{5,13}\d{4,7}$|^\w{13}-\w{1,3}$
In terms of outputting a new columns, the regex tool in parse mode will extract marked groups. Annoyingly the tool in replace mode overwrites and doesnt support a new column.
Kenneth,
If you want a little more control over the OR condition, you can FILTER with Regex_Match statements so that the field goes to the exact parsing logic that you want. Then you'll have the ability to meet your business requirements regardless of order.
Thanks,
Mark