Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to locate line items from one field in one data set to many in another.

MK2456
5 - Atom

Hello Alteryx Community,

 

I believe there is a quicker and easier way to complete a task than my current method but I am unsure as what it may be.

 

My task involves two data sets, data set A and data set B. Data set A contains a single unique list of items (these could be Part Numbers/Material Descriptions/Codes) and data set B will contain all the fields/columns (1 or more) in which I want to locate the items from data set A. This is a completeness checking task and allows data transfer from B to A. 

 

The issue I am having is that not all items from both data sets are going to be exact matches. For example I could have an item "ABC123" in A, but in B it could be located as "XXX-ABC123" or "ABC123_XXX" etc, and in one or more fields/columns. Note that there is no consistency in how the items are entered in data set B so not one cleanse or formula tool will allow for clean matching.

 

My current method has been a somewhat manual approach. I will take only one field initially in data set B and use an "IF Contains(field,"ABC123")" to pick out if "ABC123" is contained at all within that field, and then manually eyeball it to the left join of data set A. If I cannot visually see it then there is no match and "ABC123" is removed from the unique list of items from data set A that needs to be checked. The process is then repeated until the list is exhausted. Please see the workflow below for an example.

 

I will then rinse and repeat this whole process for another field/column in data set B, changing the name of the field in data set B using a Select tool to apply to the workflow.

 

 

Capture112.PNG

I have played about with various tools to attempt to find a more efficient method, but so far doing this manually has proven easiest.

 

Any help would be appreciated.

 

Thank you,

 

MK

2 REPLIES 2
estherb47
15 - Aurora
15 - Aurora

Hi @MK2456!

 

Take a look and see if this helps. I dummied up some data to replicate the situation you described below

 

Since the messy data set (data set B) is totally inconsistent in the entry and in the columns where the item # is entered, it's easiest to create one long, messy field, that combines all of the different columns together. Accomplished this with a transpose tool, and a Summarize tool (use concatenate, no separator needed)

Then I joined the two data sets together with an Append Fields. This is like a self join. For each unique row in data set A, you get all of the rows from data set B.

Next, a custom filter with Contains([messy data], [item #]) as the criteria. Anything coming out of the T node will have a match to the B dataset

 

Finally, a join back to the clean dataset (A). Those on the L node of the join did not have matches in dataset B.

A different approach would be to transpose and summarize, and then use RegEx to pull just the item # from the messy data, and a simple join. The item # in my example is always 3 digits followed by 4 letters, so (\d{3}[[alpha:]]{4}) works to pull out that code

 

image.png

danrh
13 - Pulsar

Another method would be to use the append fields to add each search term to each record individually, than a multi-field formula that looks and sees if that particular search term is included in that particular record.  Follow that with a transpose and summarize to aggregate the search terms back todether, and you end up with a clean list of which values are/aren't included in your "B" fields:

 

image.png

 

Anything with a "0" in Sum_Value isn't present in your fields and can be removed from your list.  Hope this helps!

Labels