Multi row formula not working
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am trying to solve for a condition where for a given partition of rows if number in row1 and row2 has a matching 6 or more digit sequence then mark the second row as 0 in a new field.
I used multi row formula as shown below:
If [Name] = [Row-1:Name] and
(REGEX_Match([Number], '.*(\d{6}).*') = REGEX_Match([Row-1:Number], '.*(\d{6}).*'))
then
0
Else
1
Endif
Sample input
Name | Phone |
Oscar | 234567890 |
Oscar | 345678901 |
Zoro | 1122333444 |
Zoro | 9988776600 |
expected output:
Name | Phone | result |
Oscar | 234567890 | 1 |
Oscar | 345678901 | 0 |
Zoro | 1122333444 | 1 |
Zoro | 9988776600 | 1 |
but I am getting
Name | Phone | result |
Oscar | 234567890 | 0 |
Oscar | 345678901 | 1 |
Zoro | 1122333444 | 0 |
Zoro | 9988776600 | 1 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Yuri24 - you've posted this as a comment here: https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Compare-data-in-a-parition/m-p...
Just tidying up the logic
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Yuri24 can you confirm the logic below?
If you are saying 6 digits repetition, are you saying for BOTH in the cell repetition and count of repetition across two rows?
In addition, will there by cases whereby a name exists more than 2 rows?
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Based on your logic, shouldn't it be:
Sample input
Name | Phone |
Oscar | 234567890 |
Oscar | 345678901 |
Zoro | 1122333444 |
Zoro | 9988776600 |
expected output:
Name | Phone | result |
Oscar | 234567890 | 1 |
Oscar | 345678901 | 1 |
Zoro | 1122333444 | 1 |
Zoro | 9988776600 | 1 |
For Oscar, it's between rows having 34567890 as matching values.
For Zoro, there's 9 duplicate numbers within each of their rows.
Something doesn't line up in your logic. Can you give more examples and check again?
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@caltang Second row for oscar should be 0 since the 6 or more digit sequence number matches for oscar's second row number. Basically looking for any 6digit or more sequence matches in both rows.
In zoro's case there is no match of a 6 or more digit number or sequence hence both rows should be 1. Yes there could be cases where there are more than 2 rows for same name
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
try to find a regex, no luck on it. hence here the super long ways.
1. duplicate each rows. (as 1st row compare to 2nd row and 2nd row compare to 3rd row, hence 2nd require twice)
2. assign compareID (1st vs 2nd as 1, 2nd vs 3rd as 2, etc...)
3. duplicate len-6 rows (to generate all possible any 6 char)
4. sumarize with compareID and 6char. if more than 2. then it match.
5. use join, union, unique etc. to return to clean table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Took me awhile @Yuri24 but I got something:
Major caveats:
- My method works because I'm assuming the data size you're working with is not that huge (sub 10k). Because I'm using REGEX to split your phone number to Rows, each person will have 10 rows. So if you have 10k customers, you will have 10k x 10 = 100k rows. It gets bigger and more exponential as you increase the person count, and the frequency of which they appear in the data.
- Now, I am only using Name Sequence (Their appearance # - for instance, Oscar appears twice and Zoro appears twice. So 1st and 2nd for each of their unique names). Because of the Name Sequence, I am currently only accounting for 2 rows at a time. The more rows you have, the more you have to add Filter tools and Summarize + Join Tools before the 2nd last Union tool. This will get you the proper count and if they are J, it means the appearance after the 1st will be flagged.
- So in this case, I've added a third row for you for now to test. If you have scenarios that have people with more than 3 phone numbers then you need to add more Filters.
I understand that this may not be a fully dynamic solution nor the best, but for its business use case and logic, I thought of this to achieve what you want as you can see in the results pane.
Let me know if this works.
Best,
Calvin
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@PangHC , @caltang Thanks. However, if there are continuous numbers irrespective of the actual numbers it seems to pickup the next row as a match. For example if I add 11111111111 or 9999999999 next to zoro it is considering that number as a match too in both workflows.
I am wondering why the multi row formula is not working.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hmm, perhaps you can give us more data and more expected output, then we will use the multi-row again to see.
Otherwise your 2 examples are not enough... because as you confirmed, there are scenarios where there are more than 2 rows.
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hey @Yuri24 please find attached a workflow that solves your problem
there's two options: numbers should only match the first number or numbers should match the previous number
I wasn't sure which one you wanted so I included both (probably the second?)
also regarding your question why the multi row formula is not working as you intended - the regex_match function only returns TRUE or FALSE, not the actual matched sequence. So your formula just looks if there are 6 digits in the row above it and if there are 6 digits in its own row, but not what digits are in those rows.
I am not sure if there is a way to call back to capture groups from one expression to another expression, so my workflow is a short workaround for that
