Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Keeping some Strings and loosing others

SpenceInAsda
5 - Atom

This is my first post, and i have only used Alteryx a few times for small solutions, so im really just after a pointer in the right direction.

 

A little back ground I have an extract from one of our systems that basically holds the text of a conversation. In order to mine this data we are currently sending it out of the business to a third part for processing. We have to do a cleanse where any identifiable data is removed before sending. Now historically i did this using Parse and the RegEx tool which swapped all numbers [0-9] to 9, and that was acceptable.

 

So if as an example the original text was

     I have called 2 times on the 6th and 7th, and the order is 12 hours overdue please refund my credit card 1234-4567-7894-3214

It became

     I have called 9 times on the 9th and 9th, and the order is 9 hours overdue please refund my credit card 9999-9999-9999-9999

 

Now i have been asked to let some numbers through such as (where x is the number)

x times

xth

x hours

x substitutions

 

so the new cleaned string would be 

     I have called 2 times on the 6th and 7th, and the order is 12 hours overdue please refund my credit card 9999-9999-9999-9999

 

I know i can solve this using code (.net etc..) but can something be done in Alteryx? Is there a way to identify multiply matches in a string and then not parse them, but instead parse the remaining numbers to a 9.

 

Any help for this newbie would be appreciated

 

Thanks

Spence

 

 

 

 

7 REPLIES 7
michael_treadwell
ACE Emeritus
ACE Emeritus

I would suggest using the same methods that you are now, Regex and Parse but refining the Regex to only identify cases that need substitution such as credit card numbers, phone numbers, etc.

JoeM
Alteryx Alumni (Retired)

@SpenceInAsda

I have a crude RegEx that works for this case and is attached. Right now, this will only work for card numbers in the format of 4digit-4digit-4digit-4digit. This will not work for something like an Amex, but thought it might give you some direction to see one example.

SpenceInAsda
5 - Atom

Yes that would be the simple method , but the problem comes mainly from the security side I.e. its  business rule issue not a developer issue or at least i hope its not Smiley Happy

 

We have to specify what data is going out of the business and get security to sign off, because we now need to add these extra elements of data to do deeper analysys at the third party, we have to be specific about what we are letting through, not what we are blocking Smiley Frustrated. Also whilst credit cards, telephone numbers, and emails etc.. are fairly easy to find. Something such as account number just a list of 8 numbers is harder to pickout , and there is no way to ever bottom out the list of options to parse out

 

I know its a bit left field, but its just one of those situations, And i was hoping there might be a way in Alteryx so  i didnt have to use code.

MarqueeCrew
20 - Arcturus
20 - Arcturus
\d{4,16}

That would match 4 to 16 digits and replace with 9.

269-555-1234 becomes 269-555-9999
1234-5678-1234-5678 becomes 9999-9999-9999-9999

Would that help?

Mark

Sent from my iPhone
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SpenceInAsda
5 - Atom

Thanks JoeM

 

I have a good Regex that covers all credit cards i borrowed from this site http://www.regular-expressions.info/creditcard.html

 

but its the same problem i need to define what im letting through, not what im blocking out

 

 

AdamR_AYX
Alteryx Alumni (Retired)

I think the regex you are looking for is this:

 

\d(?!(\d* times|\d*th|\d* hours|\d* substitutions))

 

In a regex replace with replacement text 9

 

The key is the the negative look ahead ?!

 

Check out this link for the full explanation of how it works

 

 

https://regex101.com/r/xM3jD7/1

 

Adam Riley
https://www.linkedin.com/in/adriley/
SpenceInAsda
5 - Atom

Thats the solution i was after. I never knew you could do a negative search with regex, and they way you showed the solution on that website was really helpfull. I think i might borrow that method myself. Smiley Very Happy

 

Thanks to all for helping the newbie out. Robot Embarassed

 

Labels