Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Name Cleansing

spencer046
8 - Asteroid

Hello, 

 

I have the below sample dataset for company names. My objective is to get rid of the 2/3/4 letter words from the names, except the first word AMD (it can't be removed). Second, I need to have only two words in the name column. For example, the 4th name AMD CENTER AT TYSONS CORNER would only have AMD CENTER. Please share your thoughts. 

 

AMD
AMD AMDS CO DDC LA
AMD C O CDS PROJECTS
AMD CENTER AT TYSONS CORNER
AMD CO CD
AMD CO CDS
AMD CO CDS PROJECTS
AMD CO DDC LA
AMD CO DEPENDABLE CO
AMD INC
AMD CO DEPENDABLE DIST CENT
AMD CO DEPENDABLE DISTRI CTR
AMD INC CO CDS SHOP
AMD INCCO CDS

9 REPLIES 9
mbarone
16 - Nebula
16 - Nebula

Have you tried the text-to-columns tool using a space as a delimiter?  I'm not 100% clear on your requirements, but it seems that would be a good place to start.

pedrodrfaria
13 - Pulsar

Hi @spencer046 

 

Please see attached for the workflow below as the solution:

 

Using the logic you mentioned. It leaves us with some fields only containing AMD.

 

 

pedrodrfaria_0-1609794826498.png

 

You can use a filter and filter out these fields with only AMD.

pedrodrfaria_1-1609794881299.png

Additionally, it should be asked, what is the logic to leave only 2 words? Would it be the first two words of the text? This was the logic I used.

 

Please mark it as a solution for your discussion post if this post answers your question.

 

 

Pedro.

 

jalizmaldonado
8 - Asteroid

Hi, I hope all is well. 

 

I'm not very experienced yet but I believe you can add the formula tool and create a conditional statement that removed all language outside of AMD. Then use the Data Cleansing Tool to remove any whitespace. Then join any additional language through another Formula tool. If you have a a data set you can share, I can try to complete it on my end. 

 

Best, 

 

Jaliz Maldonado

jalizmaldonado
8 - Asteroid

Conditional statements include "ToString" when combining and "Replace" when removing.

spencer046
8 - Asteroid

@pedrodrfaria  thanks a lot, this is exactly what I am looking for.

 

why the filter tool filters out the sole AMD names?

 

The reason for keeping two words is to maintain simplified version of names in a comparatively larger dataset. 

spencer046
8 - Asteroid

@pedrodrfaria for the next phase, I have another issue. The solution works only for a particular beginning AMD. Now I have the below dataset:

 

AMD CO DIST CENTER

Bi Tech Industries

CZ INDUSTRIES

D I S COMPANY LTD

A AND B HOMES CO

 

I still want to get rid of the 2/3/4 letter words from the names, leaving the first part of the names out of the filter/formula. For example, D I S COMPANY LTD would need to become D I S COMPANY, and A AND B HOMES CO need to become A AND B HOMES. 

 

How to adjust the workflow then? Also if you can explain the Regex tool that you used that would be great!

 

pedrodrfaria
13 - Pulsar

Hi @spencer046 

 

Please always open a new post if you have a different question. This way, someone else that may have a similar questions will be able to find the new discussion post and the provided solution in the new post. And we that support you get credit for an additional response. Feel free to create a new post to document this question as well.

 

In the Regex Tool I'm using this function (\w*\s*\w*)

 

\w = word

\s = space

* = zero or more

 

so I'm looking at fields that have zero or more words before a space, then zero or more spaces before a word, and then  zero or more words before a space.

 

Also, regarding your question, what is the logic to not consider the beginning of the field? After the first word? Only consider the last word?

 

Attached is a workflow that removes the last word of the field if it matches the conditional.

pedrodrfaria_0-1609812846140.png

 

Is this what you are looking for?

 

Pedro.

MarqueeCrew
20 - Arcturus
20 - Arcturus

@spencer046 

Using a single formula tool:

 

getword([field],0) +" "

+ regex_replace(trim(

           regex_replace(

                                substring([field],findstring([field],""))

,"\b\w{1,3}\b",'')

),"\s+",' ')

 

 it might just work

 

 cheers,

 

 mark (from my iPhone)

 

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
spencer046
8 - Asteroid

@pedrodrfaria Thank you so much, I have created a new post already according to your suggestion. I have below questions:

1. For the RegEx (thanks for explaining the expression), how its keeping 2 words, cutting the rest out?
2. For the logic, I myself am not sure. But for the below dataset

AMD CO DIST CENTER

Bi Tech Industries

CZ INDUSTRIES

D I S COMPANY LTD

A AND B HOMES CO

Hi Tech CO DIST Center

 

Below type of output I am actually expecting. My dataset has a lot of versatility in the names as you can see, so need an expert opinion to sort this out.

 

AMD CENTER

Bi Industries

CZ INDUSTRIES

D I S COMPANY

A AND B HOMES CO

Hi CENTER

Labels