I am trying to solve for a condition where for a given partition of rows if number in row1 and row2 has a matching 6 or more digit sequence then mark the second row as 0 in a new field.
I used multi row formula as shown below:
If [Name] = [Row-1:Name] and
(REGEX_Match([Number], '.*(\d{6}).*') = REGEX_Match([Row-1:Number], '.*(\d{6}).*'))
then
0
Else
1
Endif
Sample input
Name | Phone |
Oscar | 234567890 |
Oscar | 345678901 |
Zoro | 1122333444 |
Zoro | 9988776600 |
expected output:
Name | Phone | result |
Oscar | 234567890 | 1 |
Oscar | 345678901 | 0 |
Zoro | 1122333444 | 1 |
Zoro | 9988776600 | 1 |
but I am getting
Name | Phone | result |
Oscar | 234567890 | 0 |
Oscar | 345678901 | 1 |
Zoro | 1122333444 | 0 |
Zoro | 9988776600 | 1 |
Hi @Yuri24 - you've posted this as a comment here: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Compare-data-in-a-parition/m-p...
Just tidying up the logic
Hi @Yuri24 can you confirm the logic below?
If you are saying 6 digits repetition, are you saying for BOTH in the cell repetition and count of repetition across two rows?
In addition, will there by cases whereby a name exists more than 2 rows?
Based on your logic, shouldn't it be:
Sample input
Name | Phone |
Oscar | 234567890 |
Oscar | 345678901 |
Zoro | 1122333444 |
Zoro | 9988776600 |
expected output:
Name | Phone | result |
Oscar | 234567890 | 1 |
Oscar | 345678901 | 1 |
Zoro | 1122333444 | 1 |
Zoro | 9988776600 | 1 |
For Oscar, it's between rows having 34567890 as matching values.
For Zoro, there's 9 duplicate numbers within each of their rows.
Something doesn't line up in your logic. Can you give more examples and check again?
@caltang Second row for oscar should be 0 since the 6 or more digit sequence number matches for oscar's second row number. Basically looking for any 6digit or more sequence matches in both rows.
In zoro's case there is no match of a 6 or more digit number or sequence hence both rows should be 1. Yes there could be cases where there are more than 2 rows for same name
try to find a regex, no luck on it. hence here the super long ways.
1. duplicate each rows. (as 1st row compare to 2nd row and 2nd row compare to 3rd row, hence 2nd require twice)
2. assign compareID (1st vs 2nd as 1, 2nd vs 3rd as 2, etc...)
3. duplicate len-6 rows (to generate all possible any 6 char)
4. sumarize with compareID and 6char. if more than 2. then it match.
5. use join, union, unique etc. to return to clean table.
Took me awhile @Yuri24 but I got something:
Major caveats:
I understand that this may not be a fully dynamic solution nor the best, but for its business use case and logic, I thought of this to achieve what you want as you can see in the results pane.
Let me know if this works.
Best,
Calvin
@PangHC , @caltang Thanks. However, if there are continuous numbers irrespective of the actual numbers it seems to pickup the next row as a match. For example if I add 11111111111 or 9999999999 next to zoro it is considering that number as a match too in both workflows.
I am wondering why the multi row formula is not working.
Hmm, perhaps you can give us more data and more expected output, then we will use the multi-row again to see.
Otherwise your 2 examples are not enough... because as you confirmed, there are scenarios where there are more than 2 rows.
hey @Yuri24 please find attached a workflow that solves your problem
there's two options: numbers should only match the first number or numbers should match the previous number
I wasn't sure which one you wanted so I included both (probably the second?)
also regarding your question why the multi row formula is not working as you intended - the regex_match function only returns TRUE or FALSE, not the actual matched sequence. So your formula just looks if there are 6 digits in the row above it and if there are 6 digits in its own row, but not what digits are in those rows.
I am not sure if there is a way to call back to capture groups from one expression to another expression, so my workflow is a short workaround for that