Can anyone give me some pointers on best algo choices / tools, and structuring the data / configing the algo to steer it.
Many thanks in advance as always!
w
Hello @Dubya_dup_93
Thanks for posting to the Community!
Are you able to share your workflow with some sample data so the Community can see what you have attempted so far?
This will allow the Community to better troubleshoot directly where you are having trouble.
thanks!
TrevorS
Are you looking to do this natively in Alteryx or do you have experience with languages such as python?
I'm going to caveat upfront I am not a data scientist, but have done a little exploratory project work on text analytics. It looks like you need to do something along the lines of topic modelling (look at the relationship of words to draw out relationships) and named entity extraction (identifying the context of a word, for example knowing if you talk about Apple in the context of iphones and tech then Apple is company and not a fruit).
There is a brute force method of creating a library of words and then doing a count of matches against descriptions, and then setting a criteria which gets you to a relevant / not-relevant accuracy that you are comfortable.
Looking at stuff in Alteryx
The Intelligence Suite add-on for Alteryx has the ability to do unsupervised topic modelling but given you know what are relevant descriptions and what are irrelevant then I don't think this will really help here. What IS would allow you to do is read in all the descriptions and based on the LDA model it would score the descriptions to a particular topic so you might have it identify topics which talk more about how you would wear an item versus trends versus talking about materials and manufacturing topics.
Not native Alteryx, but can be coded in
Stuff like named entity recognition is not included in Alteryx as a tool, but within Alteryx you could leverage either services like Microsoft Congnitive Services or python packages such as Spacy.
Spacy has pre-trained models which you can use, for example labelled datasets based on wikipedia articles or there is the ability to train a model, but I've not explored this.
While no specific answers, hopefully this gives you some addition insight into text analytics and areas to explore.