Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex to find lines that are not useful data

Hiblet
10 - Fireball

I have some email data, that I have split into rows, so that each line in the email is on it's own record.

 

I am trying to remove lines that are like "========" or "---------" or similar, that are used as spacers.

 

I have tried using both the Regex_Match() function, and the Regex tool to find strings that do not have any letters or numbers.  I am using "\w+", which the internet tells me will match strings using word characters.  However, I seem to be getting true from the expression only for single words, or single numbers.

 

In my investigations, I have used "[a-z]+" to try to match strings with any lower case characters, but even this does not work.  I must be doing something very stupid, but I have worked as a coder, and used regex a lot, in code and in Alteryx, so I cannot figure out why this is not working.

 

Thanks in advance for any help.

4 REPLIES 4
PhilipMannering
16 - Nebula
16 - Nebula

What about this expression in an Filter Tool,

 

 

not regex_match([Email], '[ =-]+')

 

 

PhilipMannering_1-1681904501965.png

 

Christina_H
14 - Magnetar

I would use Regex_Match([Email], '.*\w.*') to find strings containing at least one alphanumeric character

Hiblet
10 - Fireball

Hi @PhilipMannering, thanks for that.  It did not seem to pick up lines that were all hyphens/dashes/minus signs.  This character is an active character for defining a range I think, so I tried escaping it with a backslash and still could not get it to go true for "----" type lines.  In the end I went with...

 

   REGEX_Match([Email], "^(.)\1*$")

 

Found it on the internet, seems to say to put a character in a capture group, and look for repeated instances of that character from the start to the finish of the line, so this is quite flexible if people use other characters as spacers.

 

Thanks for the reply.

Hiblet
10 - Fireball

Thanks @Christina_H that worked.  I went with something slightly different (see above) but this is also good, and will surely come in handy in future.

 

Much obliged!

Labels