Hello Community,
I hope that you are all doing well.
I have some REALLY dirty data that I already used the Text to Columns Tool to clean the Spirits Category and Wine Category Columns. I am now stuck with a task that is proving to be a little more difficult (at least for me) of cleaning the data on the Spirits Sub-Categories and Wine Sub-Categories columns based on the data on the Spirits and Wine Categories.
For the example below
Spirits Categories Spirits Sub-Categories
AmericanWhiskey Bourbon,Blanco/SilverTequila
I need to be able to say if Spirits Category contains AmericanWhiskey and Spirits Sub-Categories contains Bourbon then remove all other texts.
It would ultimately end up looking like this.
Spirits Categories Spirits Sub-Categories
AmericanWhiskey Bourbon,Blanco
An additional concern is when the Spirits Category is AmericanWhiskey and I need to select RyeWhiskey.
Spirits Categories Spirits Sub-Categories
AmericanWhiskey Bourbon,RyeWhiskey
As always, your time and help is always tremendously appreciated.
Fausto
Solved! Go to Solution.
Hi @Fescobar,
Assuming that the Spirits Sub-Categories are all delimited by a comma, I would suggest to split them into rows and map to a self-defined table containing the values which you want, with the corresponding importance level of the value. Please see attached my suggested solution.
Hope it helps.
Cheers,
Lelia
Thank you @ncrlelia,
You this is the second time you jump to the rescue, thank you.
I received further clarification and used the Text to Columns to split these into rows; however, since this is scraped data, it is far from being clean. This being said I am encountering a different issue now. I am seeing where the data is mismatched like the examples below.
Spirits Categories Spirits Sub-Categories
AmericanWhiskey Blanco/SilverTequila
Mezcal&Tequila Bourbon
Amaro TennesseeWhiskey
Amaro FlavoredRum
Rum Bourbon
ScotchWhisky Blanco/SilverTequila
Vodka AñejoTequila
In these cases, I'd rather make them blank or NDA. I am not certain how I can make the statement if for example "If Spirits Categories Rum does not equal Spirits Sub-Categorie AgedRum, GoldRum,SpicedRum, FlavoredRum...then NDA. The same thing for all of the other Spirits Categories. I am going to have to repeat the exercise for wines as well.
Thank you!
Fausto
Hi @Fescobar,
You're welcome. Happy that I'm able to help. 😊
Regarding the mismatched subcategories, you can Join the matched set with the original set to identify the mismatched set. Subsequently, set the subcategories value to 'NDA' or any value you want. Then union the matched and mismatched sets.
Please refer to the attached for a more detailed solution. I've created dummy data in the self-defined table.
Hope it helps.
Best Regards,
Lelia