Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Name cleansing and Parsing

8 - Asteroid

I have the below dataset:

 

AMD CO DIST CENTER

Bi Tech Industries

CZ INDUSTRIES

D I S COMPANY LTD

A AND B HOMES CO

Hi Tech CO DIST Center

 

I want to get rid of the 2/3/4 letter words from the names, either from the end or in the middle, leaving the first part of the names out of the filter/formula. 

 

The output I am expecting:

 

AMD CENTER

Bi Industries

CZ INDUSTRIES

D I S COMPANY

A AND B HOMES CO

Hi CENTER

 

Can anyone help with a workflow? Thanks!

Alteryx Certified Partner
Alteryx Certified Partner

@spencer046 ,

 

You've got lots of name cleaning ahead of you.  BTW, AND violates your 2/3/4 letter words.

 

Replace(Trim(GetWord([Field1], 0) + 
RegEx_Replace(RegEx_Replace(Substring(Replace([Field1],"AND",'&'),FindString(Replace([Field1],"AND",'&'), " ")),"\b\w{2,4}\b",''),"\s+",' ')),"&",'AND')

 

This one nested expression will get you the output that you want, but you drop CO.  If you want to keep CO or other words, you should use a FIND Replace tool and make each KEY word a unique symbol (like I did with & for AND).  Use that replacement before the expression and then use another FIND REPLACE tool to convert back.

 

If you don't need to worry about these types of words, you can use:

Trim(GetWord([Field1], 0) + 
RegEx_Replace(RegEx_Replace(Substring([Field1]),FindString([Field1], " ")),"\b\w{2,4}\b",''),"\s+",' '))

 

Cheers,

 

Mark

 

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
8 - Asteroid

@MarqueeCrew  Awesome that's exactly what I am looking for, thanks a ton for your help. 

 

I am not too good at alteryx, so can you help me understand the RegEx formula by breaking it down? So I know what adjustments need to be made in future if I come across similar situation.

Alteryx Certified Partner
Alteryx Certified Partner

@spencer046 ,

 

\b is a word boundary

 \w is a word character

{2,4} qualifies the preceding attribute being 2 to 4 occurrences. 

any of your rules will have exceptions. You will throw good words away and keep bad words. Depending upon your data, you'll get a variety of results. One size does not fit all data. 

you need to develop an approach to cleaning. Easy too hard. Biggest bang first.  Once you've cleaned data, save your results. Next time you can look at a saved file and convert directly to the know good results. 

Business names are very tricky to work with. You need to find tools like regex and learn them. 

cheers,

 

 mark

 

 

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
8 - Asteroid

@MarqueeCrew what's the utility of the below?

"\s+",' '

 

Alteryx Certified Partner
Alteryx Certified Partner

Replace 1 or more spaces with a single space

 

Alteryx RegEx Beginner Tutorial
https://youtu.be/pTZj2U2SDFA

cheers

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
How to use tokens to look for patterns in your data and make desired replacements. MarqueeCrew takes you on a guided journey of 20 examples in roughly 20 mi...
8 - Asteroid

@MarqueeCrew  Thanks a lot. With your formula, its working properly, but at the same time, its truncating some words as shown below in the output. Any thoughts? I am using the formula that does not take AND into consideration. 

 

APPLIANCE BUILDER WHOLESALE --> APPLIANCE BUILDER

APPLIANCE CONNECTION INC --> APPLIANCE CONNECTI

APPLIANCE DISCOUNTERS --> APPLIANCE DISCOUNT

CAPITAL DISTRIBUTING --> CAPITAL DISTRIBUTI

ADP CO DEPENDABLE DISTRIBUTION CENTERS --> ADP DEPENDABLE D

Alteryx Certified Partner
Alteryx Certified Partner
Trim(GetWord([Field1], 0) + 
RegEx_Replace(RegEx_Replace(Substring([Field1],FindString([Field1], " ")),"\b\w{2,4}\b",''),"\s+",' '))
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
8 - Asteroid

@MarqueeCrew I am using that formula but still getting those truncates.

Alteryx Certified Partner
Alteryx Certified Partner

I wonder if you're defining the field with a short length?

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Labels