Hi,
I'm struggling with a file of email addresses to clean up.
I come across fake emails where a letter can be repeated many times such as :
xxx@gmail.com
yyyidmcho@naver.com
xx.xxx@gmail.com
XXX@XXX
avaxxxava@gmail.com
xxxxx.yyyyy@hotmail.fr
xxxprettyboy@naver.com
lucaslucasxxx94@gmail.com
could you please help me with a regular expression to search all the emails with any letter repeated more than twice?
thank you
Solved! Go to Solution.
So there might very well be a more elegant way to do this, but RegEx is a tricky beast, and while ugly, this at least should work...
The general idea is to create a very long RegEx pattern that includes all the possible combinations of a single letter repeated 3 times, i.e. "aaa|bbb|ccc..." etc. The | pipe delimiter acting as the OR in this scenario. So I basically just listed all the letters a-z, then used a formula to create a field that repeated each letter 3 times, then concatenated using Summarize tool and a pipe delimiter (OR operator in the RegEx expression) with double quotes on each end.
This can then be used in the "pattern" section of a RegEx CountMatches expression, so it will count how many of those combinations it finds in your string. Using REGEX_CountMatches([Email],[Concat_LetterCombo],1) as the expression will allow for it to be case insensitive, so will work for AAA as well as aaa. Kinda ugly, but it works!
Cheers,
NJ
Here's an alternative:
 
					
				
				
			
		
