Hello,
I have a workflow that tags multiple keywords to content grouped by different projects. This is done by a batch macro. The workflow works fine when we tag by a small number of keywords. However, when we increase the number of keywords, the workflow runs long time and sometimes gives us "unexpected error".
Is there a better way to tag multiple keywords so that the workflow process faster and more efficient?
Attached a sample workflow that contains a batch macro.
Input 1 (contents)
Input 2 (keywords)
Desired output
Sincerely,
knozawa
Solved! Go to Solution.
Hi, @knozawa - are your keywords always one word (such as Alteryx or Banana), or can they be phrases (multiple words)?
If they are always one word, you could parse the [content] into words, and then join them to your keywords (rather than using the macro). I randomly created 250,000 content phrases and 100,000 keywords, and this method ran in 25 seconds on my laptop. This won't work if your keywords could be multiple words, though.
@ponraj ,
Thank you for your suggestion. However, I wonder if Iterative macro performs faster than batch macro. Also, if I can run filter by keywords for specific groups only (my desired output shows tagging multiple keywords grouped by specific projects).
Sincerely,
knozawa
Thank you for your suggestion. Yes indeed, your method is much faster than using macro. However, as you mentioned, I have some phrases to filter by. I wonder if there is a way to achieve similar performance to filter by phrases.
Sincerely,
knozawa
I've attached the solution workflow.
Thank you all, especially @DanielBr from Alteryx support engineer helped me to solve this issue.
Batch macro vs regex with find replace methods:
Regex with find replace method was 1148 times faster performance because it processed only one time looking up keywords instead of looping multiple times. Also, regex with find replace method had more matches because some of the contents did not have spaces between words due to Japanese language.
Two take-away for using regex with find replace method:
1. When contents contain keywords, only the longest keyword matches.
Contents = John Aaron Smith
Keyword 1 = John
Keyword 2 = John Aaron
Keyword 3 = John Aaron Smith
2. Potential mismatch for some keywords (i.e. "app" matches with "appliances" and "applications" not only "app")
As a result:
Batch macro method is suitable when:
1. Multiple keywords should be tagged for the same contents match (i.e. keyword 1, 2, and 3 should be tagged for contents = "John Aaron Smith")
2. List of keywords contains "short-length" keywords (i.e. "app")
Regex with find replace method is suitable when:
1. There are many keywords in the list to match
2. Words are not separated by spaces (i.e. Japanese/Chinese)
Hope this helps for people who have similar use cases.
Sincerely,
knozawa
Hi @knozawa ,
Thanks for posting the approach you used! Also, I believe @MarqueeCrew is working on a new macro that helps with this.
@MarqueeCrew , can you jump in - hopefully I haven't got the wrong end of the stick?
Best,
Tom
@knozawa - thanks so much for reporting back with your solution (and providing the details on when each approach may/may not work for different scenarios)!