This is my first post, and i have only used Alteryx a few times for small solutions, so im really just after a pointer in the right direction.
A little back ground I have an extract from one of our systems that basically holds the text of a conversation. In order to mine this data we are currently sending it out of the business to a third part for processing. We have to do a cleanse where any identifiable data is removed before sending. Now historically i did this using Parse and the RegEx tool which swapped all numbers [0-9] to 9, and that was acceptable.
So if as an example the original text was
I have called 2 times on the 6th and 7th, and the order is 12 hours overdue please refund my credit card 1234-4567-7894-3214
It became
I have called 9 times on the 9th and 9th, and the order is 9 hours overdue please refund my credit card 9999-9999-9999-9999
Now i have been asked to let some numbers through such as (where x is the number)
x times
xth
x hours
x substitutions
so the new cleaned string would be
I have called 2 times on the 6th and 7th, and the order is 12 hours overdue please refund my credit card 9999-9999-9999-9999
I know i can solve this using code (.net etc..) but can something be done in Alteryx? Is there a way to identify multiply matches in a string and then not parse them, but instead parse the remaining numbers to a 9.
Any help for this newbie would be appreciated
Thanks
Spence
I
Solved! Go to Solution.
I would suggest using the same methods that you are now, Regex and Parse but refining the Regex to only identify cases that need substitution such as credit card numbers, phone numbers, etc.
I have a crude RegEx that works for this case and is attached. Right now, this will only work for card numbers in the format of 4digit-4digit-4digit-4digit. This will not work for something like an Amex, but thought it might give you some direction to see one example.
Yes that would be the simple method , but the problem comes mainly from the security side I.e. its business rule issue not a developer issue or at least i hope its not
We have to specify what data is going out of the business and get security to sign off, because we now need to add these extra elements of data to do deeper analysys at the third party, we have to be specific about what we are letting through, not what we are blocking . Also whilst credit cards, telephone numbers, and emails etc.. are fairly easy to find. Something such as account number just a list of 8 numbers is harder to pickout , and there is no way to ever bottom out the list of options to parse out
I know its a bit left field, but its just one of those situations, And i was hoping there might be a way in Alteryx so i didnt have to use code.
Thanks JoeM
I have a good Regex that covers all credit cards i borrowed from this site http://www.regular-expressions.info/creditcard.html
but its the same problem i need to define what im letting through, not what im blocking out
I think the regex you are looking for is this:
\d(?!(\d* times|\d*th|\d* hours|\d* substitutions))
In a regex replace with replacement text 9
The key is the the negative look ahead ?!
Check out this link for the full explanation of how it works
https://regex101.com/r/xM3jD7/1
AdamR Thats the solution i was after. I never knew you could do a negative search with regex, and they way you showed the solution on that website was really helpfull. I think i might borrow that method myself.
Thanks to all for helping the newbie out.