Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Text Extraction Machine Learning

theinsideguy
7 - Meteor

Hello,

 

I am trying to perform text extraction based on unstructured data purely by machine learning (I recently purchased the intelligent suite). My dataset looks like something below. For each customer, they write unstructured text about two topics, topic 1 and topic 2. Each of these topics are the same for each customer, but the way they are written is highly variable with keywords that often intersect both topics. Instead of spending days trying to figure out NLP rules, I'm looking to take a different approach by simply training a model similar to how monkeylearn does it: youtube.com/watch?v=5xhvJls8b78&list=PL4yw9SBwClHQSzMHZEX4zhMvwiAKFtiE3&index=3 to tell me where the delimiter between the two topics are. What is the best way to do this in Alteryx? I've struggled getting this to work right with the classification tool.

 

CustomerUnstructured TextDesired output 1Desired output 2
1Bunch of text about topic 1 … bunch of text about topic 2Bunch of text about topic 1 bunch of text about topic 2
2Bunch of text about topic 1 … bunch of text about topic 2Bunch of text about topic 1bunch of text about topic 2
3Bunch of text about topic 1 … bunch of text about topic 2Bunch of text about topic 1bunch of text about topic 2
4Bunch of text about topic 1 … bunch of text about topic 2Bunch of text about topic 1bunch of text about topic 2
5Bunch of text about topic 1 … bunch of text about topic 2Bunch of text about topic 1bunch of text about topic 2

 

Edit: Really what I'm looking to see is if Alteryx can perform NER (Named-Entity-Recognition) using a training set. 

4 REPLIES 4
cgoodman3
14 - Magnetar
14 - Magnetar

In your unstructured text field how long are the text fields? And are they always written as topic one followed by topic 2? Could you split this out into chunks based on sentences. Then in 2021.2 onwards the topic modelling tool allows you to score text against a previous topic model, using the scores you would be able to see the scores for topic 1 decreasing and topic 2 increasing and you can use the cross over point as your delimiter?

 

For NER this isn’t yet in IS but I am hoping it is added soon. There’s python packages such as SpaCy which you can train and this might be easier to identify ‘topics’ is you know the first one talks about people and the second one talks about organisations, as an example. There’s also stuff provided by the cloud providers. An example is this post which uses Azure https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Alteryx-Text-Analytics-Entities-Extrac...

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
terry10
11 - Bolide

Named Entity Recognition is in beta testing now. 🙂

theinsideguy
7 - Meteor

Great news on NER! Any update on this?

cgoodman3
14 - Magnetar
14 - Magnetar

NER has been added to the 2021.4 release and it has a visual output labelling each identified entity 👍

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
Labels