Alteryx Designer Desktop Discussions

Sriram369 · ‎06-28-2021

i have a very large data set (more than 3 lakhs rows) very two columns are unstructured means (same component mentioned in various format) Aim is to filter out with respective to individual components. Attached sample for reference.

Luke_C · ‎06-28-2021

@Sriram369 What would the desired output look like?

KarolinaRoza · ‎06-28-2021

Hi,

I would start with Summarize Tool (grouped by Remarks, Count: Item Name) then Sort Tool : Count - Descending.

This will allow you to look at most common Remarks, and maybe come up with some Filter Tools: for example Contains([Remarks],"STORE") to create some subsets of the original data and then group by specific category.

It depends what you need, what kind of details you need.

Karolina

DawnDuong · ‎06-28-2021

hi @Sriram369

it feels to me that you want to solve a classification. Based on my personal experience, you need to get the “key word” list from a domain expert to narrow down the field into the key categories and the iteratively whittle down the “residual” unmatched.

If you have access to the Word Cloud tool, that may be one way to get the initial key word list.

dawn

Alteryx Designer Desktop Discussions

Two Unstructured columns need to be organized