Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
NeilR
Alteryx Alumni (Retired)

We recently released a Microsoft Kit that included a text analytics tool. This tool uses a Cortana Analytics Gallery text analytics API to provide sentiment analysis and key phrase extraction. The tool has received positive feedback but is limited to 10,000 records per month before you have to pay a monthly fee. Given this backdrop, I wanted to compare the Microsoft sentiment analysis capability to a couple open source algorithms available.

 

The Sentiment Tool

 

The first open source package I identified to try out was the R package "sentiment". The package has long been archived on CRAN but is still available for download. It was not too difficult to leverage this package inside of Alteryx - a few lines of code in the R tool was all that was needed.

 

The second package came to my attention via a Microsoft blog post. The Stanford CoreNLP project is an expansive "set of natural language analysis tools", even though (for now) I'm only interested in sentiment analysis. I was able to utilize this package in Alteryx via the Run Command tool. Whereas the "sentiment" package gives a total score for an entire block of text, the Stanford package parses each sentence and gives a separate score for each. For the sake of this tool, I averaged these scores together to give a single score for the entire text.

 

To use the tool you'll need to download each of these packages and point to where you've downloaded them as per the instructions laid out in the tool's interface. (I'm not a lawyer, but I think by forcing you to download the packages yourself it absolves me of all liability of you violating the packages' license terms.) Here's how it looks after I've configured mine:

 

Sentiment.PNG

 

The Analysis

 

I'm using Sentiment140 data for the analysis. Basically it's twitter data that's been pre-scored according to emoticons - if a tweet contained a smiley, it's positive; a frowny, negative. (Careful if you want to use this data as it's Twitter, so probably NSFW.) In order to do this for free (see Microsoft API record limit above), I'm limiting to a random subset of 10,000 tweets.

 

 accuracy.png

The Microsoft algorithm came out on top for accuracy. Interestingly, it was also the fastest, even though it was leveraging an API over the web.

speed.png

I've attached the Sentiment tool - feel free to tweak it to see if you can improve the accuracy or performance. I also encourage you to try to replicate my results, either with the Sentiment140 data or some other data source. I'll attach my analysis in the comments upon request.

Neil Ryan
Sr Program Manager, Community Content

Neil Ryan (he/him) is the Sr Manager, Community Content, responsible for the content in the Alteryx Community. He held previous roles at Alteryx including Advanced Analytics Product Manager and Content Engineer, and had prior gigs doing fraud detection analytics consulting and creating actuarial pricing models. Neil's industry experience and technical skills are wide ranging and well suited to drive compelling content tailored for Community members to rank up in their careers.

Neil Ryan (he/him) is the Sr Manager, Community Content, responsible for the content in the Alteryx Community. He held previous roles at Alteryx including Advanced Analytics Product Manager and Content Engineer, and had prior gigs doing fraud detection analytics consulting and creating actuarial pricing models. Neil's industry experience and technical skills are wide ranging and well suited to drive compelling content tailored for Community members to rank up in their careers.

Comments
alexandra_hanna
7 - Meteor

Hi NeilR,

Hope you are well. Could you please share the configuration of the run command tool? It is my first time trying to run java through Alteryx and I need a bit of help setting it up.

Cheers,

Alexandra

NeilR
Alteryx Alumni (Retired)

@alexandra_hanna You can download the macro attached to the post and open it in Alteryx to see how the run command tool is configured. I've also pasted a screenshot of the configuration below...

Capture.PNG

MizunashiSinayu
8 - Asteroid

Hi, 

Thank you very much for the share...

However, I am facing some problems here...

 

When I use sentiment, I got this error

Error: Sentiment (2): Tool #5: Error in install.packages(package_name) : unable to install packages

 

On the other hand, when I tried the Stanford NLP another error occured:

Error: Sentiment (2): Tool #37: File not found "C:\Users\User\Documents\Tugas\EYSI\stanford-corenlp-full-2016-10-31\__temp_file_*.csv.out"

 

Any idea to help me? :D

 

Thanks~
Regards - Mizu

NeilR
Alteryx Alumni (Retired)

@MizunashiSinayu can you post a screenshot of how you've configured the tool? it would also help if in your workflow you go to the Runtime tab of the Configuration panel and enable "Show All Macro Messages" and relay the entire log after re-running the workflow.

MizunashiSinayu
8 - Asteroid

 

Hi,

Thank you very much for the kind reply, for the sentiment I found a way to solve it.


The problem for 

Error: Sentiment (2): Tool #5: Error in install.packages(package_name) : unable to install packages

Is because I did not grant the access to alteryx to install the package. They will install the package somewhere in the Program Files/Alteryx/R 3.2.x/library     

I managed to fix this problem by right clicking the folder through explorer > properties > security. I grant all the permision and eureka, it works! I granted the access to the Alteryx root folder and it saves me from many problems (But at my own risk)

 

 

 

For the Stanford, I can only run it with the "2015-12-09" package (same as your screenshot). They launched a new "2016-10-31" package. The new package does not work.

 

 

The log is as follows:

  • Designer x64 Started running at 18/01/2017 14:35:54
  • Sentiment (2) Tool #11: 1 record was output
  • Sentiment (2) Tool #64: 0 records were output
  • Text Input (1) 1000 records were output
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_1.csv"
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_2.csv"
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_3.csv"
  • Sentiment (2) 3 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_all_files.txt"
  • Sentiment (2) Tool #37: 6 records were written in total
  • Sentiment (2) Tool #37: File not found "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_*.csv.out"
  • Designer x64 Finished running in 3,4 seconds with 1 error

 

Best regards,

Mizu

blevy
5 - Atom

Hello - I'm trying to use the Standford functionality and I'm receiving the following error even using the 2015-12-09 package...

 

Error: Sentiment (1): Tool #37: The external program "java" returned an error code: 1

 

Any assistance is greatly appreciated. :)

NeilR
Alteryx Alumni (Retired)

Hi @blevy - sorry for the late response. Can you post a screenshot of how you've configured the tool? It would also help if in your workflow you go to the Runtime tab of the Configuration panel and enable "Show All Macro Messages" and relay the entire log after re-running the workflow.

NeilR
Alteryx Alumni (Retired)

I just tried using this with the most recent version of the Stanford CoreNLP package (3.9.0 AKA stanford-corenlp-full-2018-01-31) and the macro requires a minor tweak to function properly. Open Run Command tool #37 and change the Read Results Input configuration from path\__temp_file_*.csv.out to path\__temp_file_*.csv.xml.

ujjwalstha
5 - Atom

Thank you Neil for fixing the error. It works great now.

trettelap
8 - Asteroid

@NeilR

Great macro! I was able to get the macro for the Stanford CoreNLP analysis with the most recent version of Alteryx and R. However, when I run the sentiment package, I get the following error. Is it possible something needs to be changed to get it to work with the current R version? 3.5.3. I apologize as I'm not entirely familiar how everything works but can provide further detail as necessary.

 

Error: Sentiment (3): Tool #5: Error in classify_polarity(inputData[1], algorithm = "bayes") :

 

Thank you!

 

Edit: So I found this and it looks like it is a problem with where the repo is hosted. Would there be a way to update the script for this?

 

https://stackoverflow.com/questions/56942406/package-rstem-is-not-available-for-r-version-3-5-1

NeilR
Alteryx Alumni (Retired)

@trettelap since Rstem is no longer available, and the tool relies on that, might I suggest trying the Vader Sentiment tool created by @Nate1? It's available here.

trettelap
8 - Asteroid
Thank you Neil! That is a great tool.