Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Separating rows with inventor name from a mixture of company and inventor names

KEP
5 - Atom

Hello everyone.  I'm working with patent data that has assignee/applicant column with a mixture of company and inventor names.  I'm only interested in names of the company not the inventors'.  Is there any way to separate out/ remove inventor names?  My data looks something like the attached screenshot.

 

Thank you for your help!

4 REPLIES 4
Inactive User
Not applicable

Your challenge is having names and companies in the same field. However, if you take a text to columns and parse using commas into 4 columns that will split everything up.

 

From there, we can see that your first parsed column will either have someone's last name or a company. On the rows with companies you will have at least 3 commas. So after you do this parsing,

create a formula field that does the following:

 

IF !IsNull([Field_Parse3]) THEN 'Company' ELSE 'Inventor' ENDIF

 

Then filter for only Company records after that and Summarize/Group By Field_Parse1 which will contain your company name. 

KEP
5 - Atom

Hi Ryan,

 

Thank you for your help!  Filtering out by cells with more than 2 commas helped parse about 50% of the total data, but there are some inventor entries with more than 2 commas, for instance, "John Johnson, San Diego, CA, US" (Basically most of US inventor has 3 commas).  To make matters worse, some companies have zero, one, or two commas, such as  "JBL Incorporated".  Essentially the data cannot be parsed by number of commas, periods, or length due to the randomness.  

 

Through research I found that Alteryx has fuzzy match tool that can recognize nickname from a actual name, so I was wondering if there's similar tool to recognize names in the data.

 

Any help?

 

 

Inactive User
Not applicable

Yeah if you cannot find a pattern after splitting the commas into columns that would be the next best option. Best way to configure that is to first get a master list of companies. Then you need to take that current stream you have with the data and do the text to columns based on comma (make sure every comma is parsed; can do 5-6 to be safe). Then, transpose those records by grouping on the record/transaction ID. From there, filter to remove the nulls and now you have your list to work with.

 

To do the fuzzy match, you will need a source and record ID field, for both the master list and the normal list you just made. Source can be "Master" or "Data" for example. Record ID can be done using the Record ID tool just make sure the master list starts at 100000 so it is easy to differentiate the two. And finally, make sure the name/value field is the same for both Master and Data. The next step is to union those together and then put the fuzzy match on. The fuzzy match will ask for record, source, and the field to do the match on (value). Make sure to check output score and then once that runs through you will have your matches. With your matches you just need to join the IDs back to the original data set and you will be good to go.

 

Note fuzzy matching is not always 100% accurate and you will have mismatches most likely still. The other alternative is to use a dynamic formula (CreW Macro Pack) and feed the master list into the formula and do a Contains function to validate Company (more of a data mining function).

KEP
5 - Atom

Thanks for giving me some ideas.  I will try out!

Labels