Hello,
I have the below sample dataset for company names. My objective is to get rid of the 2/3/4 letter words from the names, except the first word AMD (it can't be removed). Second, I need to have only two words in the name column. For example, the 4th name AMD CENTER AT TYSONS CORNER would only have AMD CENTER. Please share your thoughts.
AMD
AMD AMDS CO DDC LA
AMD C O CDS PROJECTS
AMD CENTER AT TYSONS CORNER
AMD CO CD
AMD CO CDS
AMD CO CDS PROJECTS
AMD CO DDC LA
AMD CO DEPENDABLE CO
AMD INC
AMD CO DEPENDABLE DIST CENT
AMD CO DEPENDABLE DISTRI CTR
AMD INC CO CDS SHOP
AMD INCCO CDS
Solved! Go to Solution.
Have you tried the text-to-columns tool using a space as a delimiter? I'm not 100% clear on your requirements, but it seems that would be a good place to start.
Hi @spencer046
Please see attached for the workflow below as the solution:
Using the logic you mentioned. It leaves us with some fields only containing AMD.
You can use a filter and filter out these fields with only AMD.
Additionally, it should be asked, what is the logic to leave only 2 words? Would it be the first two words of the text? This was the logic I used.
Please mark it as a solution for your discussion post if this post answers your question.
Pedro.
Hi, I hope all is well.
I'm not very experienced yet but I believe you can add the formula tool and create a conditional statement that removed all language outside of AMD. Then use the Data Cleansing Tool to remove any whitespace. Then join any additional language through another Formula tool. If you have a a data set you can share, I can try to complete it on my end.
Best,
Jaliz Maldonado
Conditional statements include "ToString" when combining and "Replace" when removing.
@pedrodrfaria thanks a lot, this is exactly what I am looking for.
why the filter tool filters out the sole AMD names?
The reason for keeping two words is to maintain simplified version of names in a comparatively larger dataset.
@pedrodrfaria for the next phase, I have another issue. The solution works only for a particular beginning AMD. Now I have the below dataset:
AMD CO DIST CENTER
Bi Tech Industries
CZ INDUSTRIES
D I S COMPANY LTD
A AND B HOMES CO
I still want to get rid of the 2/3/4 letter words from the names, leaving the first part of the names out of the filter/formula. For example, D I S COMPANY LTD would need to become D I S COMPANY, and A AND B HOMES CO need to become A AND B HOMES.
How to adjust the workflow then? Also if you can explain the Regex tool that you used that would be great!
Hi @spencer046
Please always open a new post if you have a different question. This way, someone else that may have a similar questions will be able to find the new discussion post and the provided solution in the new post. And we that support you get credit for an additional response. Feel free to create a new post to document this question as well.
In the Regex Tool I'm using this function (\w*\s*\w*)
\w = word
\s = space
* = zero or more
so I'm looking at fields that have zero or more words before a space, then zero or more spaces before a word, and then zero or more words before a space.
Also, regarding your question, what is the logic to not consider the beginning of the field? After the first word? Only consider the last word?
Attached is a workflow that removes the last word of the field if it matches the conditional.
Is this what you are looking for?
Pedro.
Using a single formula tool:
getword([field],0) +" "
+ regex_replace(trim(
regex_replace(
substring([field],findstring([field],""))
,"\b\w{1,3}\b",'')
),"\s+",' ')
it might just work
cheers,
mark (from my iPhone)
@pedrodrfaria Thank you so much, I have created a new post already according to your suggestion. I have below questions:
1. For the RegEx (thanks for explaining the expression), how its keeping 2 words, cutting the rest out?
2. For the logic, I myself am not sure. But for the below dataset
AMD CO DIST CENTER
Bi Tech Industries
CZ INDUSTRIES
D I S COMPANY LTD
A AND B HOMES CO
Hi Tech CO DIST Center
Below type of output I am actually expecting. My dataset has a lot of versatility in the names as you can see, so need an expert opinion to sort this out.
AMD CENTER
Bi Industries
CZ INDUSTRIES
D I S COMPANY
A AND B HOMES CO
Hi CENTER
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |