community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

How to Trim Numbers: Regex to the Rescue

Alteryx
Alteryx
Created on

Let's say you have data like this and you'd like to remove the numbers at the end: 

Blue4509

Yellow2345

Orange2315

Blue6754

 

 

You could use Left(x, len), like this: left([Data], (length([Data]) - 4))

(This calculates the length of the field, subtracts 4 and then takes that many characters from the left. E.g.: length(Blue4509) = 8, 8-4 = 4, left(Blue4509,4) = Blue)

 

Or a combination of ReverseString(Str) and Substring(x, start, length)ReverseString(substring(ReverseString([Data]),4,100))

(This reverses the string, uses a substring to start at the 5th position [positions are zero based], takes the next 100 characters, and then reverses it back. E.g.: reversestring(Blue4509) = 9054eulB, 5th position = e, next 100 characters = eulB, reversing it back results in Blue)

 

 

BUT what if your data looks like this: 

Blue4509
Yellow2345
Orange231
Blue6754
Green596828

 

The above won't work anymore, but ReplaceChar(x, y, z) will: replacechar([Data], '0123456789','')

(ReplaceChar(x,y,z) returns the string [x] after replacing each occurrence of the character [y] with the character [z].)

 

Or Regex_Replace(string, pattern, replace,icase): regex_replace([Data],"\d","")

(Regex_Replace(string, pattern, replace,icase) returns the string after replacing the pattern with the replace. In this case, it replace all digits.)

 

Or the Regex Tool

 

10-31-2016 10-16-23 AM.png

\d in regular expression identifies digits (aka numbers)

 

 What if your data looks like this: 

1Blue4509
2Yellow2345
3Orange231
4Blue6754
5Green596828

 

and you want to preserve the number at the beginning of the string but remove the ones at the end? 

 

The $ character in regex denotes the end of the string, * identifies one or more instances of the preceding character, so \d*$ identifies one or more digits at the end of the string: regex_replace([Data], "\d*$","") or: 

10-31-2016 11-06-04 AM.png

 

 Also check out all the other functions on the Alteryx help page, POSIX Extended Regular Expression Syntax, more info on the Regex Tool, and the Regex Cheat Sheet on the community. 

Comments
Asteroid

Hi there --- this is so close to what I am trying to do but I'm struggling with getting the regex syntax correct. Here is what I am trying to convert: 

 

COURS-8651-2

COURS-6651-20

COURS-7786F-1

 

What I need is the following: 

COURS8651

COURS6651

COURS7786F

 

I can get the last number removed but I cannot seem to get the "-" removed in either position. How should I tweak this formula {regex_replace([Data], "\d*$","")} to acheive my results? 

 

Thank you! 

Asteroid

Hi there --- I actually resolved the message abovt but have another problem. I have data that looks like this: 

 

CRJS-9000-36/PPPA-9000A-36-Dissertation2016 Fall Qtr 08/29-11/20-PT27
HUMN-9001-48/COUN-8560-48/SOCW-9000-48-Dissertation2016 Fall Qtr 08/29-11/20-PT27

 

I need to remove everything that follows the section number (highlighted in red). As you can see, some will have two courses and others will have three courses so I cannot use LEN syntax. Since the information is words, numbers and a date along with the leading "-", I am really struggling with the regex syntax to remove that information. 

 

Any help you might toss my way would certainly be appreciated. 

 

Thanks, 

--Kim

Alteryx
Alteryx

@KAFord Kim,  is it just a matter of finding the section number preceeded and followed by a dash e.g. -\d\d-  ?   

 

This will parse the two parts of the larger string

 

2017-01-06_17-25-40.png

 

 

Asteroid

I will try that particularly since not all of the courses will have the word "dissertation" in them. In another file that is like this, the course name follows so while I found a way to remove the "dissertation" for the next file I'm working with I cannot. I actually used this using a filter: 

 

TRIM(Left([Course Name1], FindString([Course Name1], "-Dissertation")))

 

However, as you can see, it will only remove courses with that word so your method should take care of all of them.  I just started using Alteryx this week so I'm going through a steep learning curve particularly with creating the syntax for Regex. 

 

Thanks, Andy!!! 

 

 

Asteroid

Great work with regex!  I've run into an issue where I've parsed out an address and in the StreetName there are stings like '5th Avenue Apt Apt Apt.'  I want to do a Replace(<String>, "Apt", "") but there are a few that have '6th Ave Apt G' or 'Apt610' or 'AptA.'  How do I move those with the 'Apt' and a letter or number over the the SuiteName and SuiteNumber fields?  Can I use regex? Or is it just a can of worms?

 

Thanks,

Brad