community
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Science Blog

Machine learning & data science for beginners and experts alike.
Sr. Community Content Manager
Sr. Community Content Manager

We recently released a Microsoft Kit that included a text analytics tool. This tool uses a Cortana Analytics Gallery text analytics API to provide sentiment analysis and key phrase extraction. The tool has received positive feedback but is limited to 10,000 records per month before you have to pay a monthly fee. Given this backdrop, I wanted to compare the Microsoft sentiment analysis capability to a couple open source algorithms available.

 

The Sentiment Tool

 

The first open source package I identified to try out was the R package "sentiment". The package has long been archived on CRAN but is still available for download. It was not too difficult to leverage this package inside of Alteryx - a few lines of code in the R tool was all that was needed.

 

The second package came to my attention via a Microsoft blog post. The Stanford CoreNLP project is an expansive "set of natural language analysis tools", even though (for now) I'm only interested in sentiment analysis. I was able to utilize this package in Alteryx via the Run Command tool. Whereas the "sentiment" package gives a total score for an entire block of text, the Stanford package parses each sentence and gives a separate score for each. For the sake of this tool, I averaged these scores together to give a single score for the entire text.

 

To use the tool you'll need to download each of these packages and point to where you've downloaded them as per the instructions laid out in the tool's interface. (I'm not a lawyer, but I think by forcing you to download the packages yourself it absolves me of all liability of you violating the packages' license terms.) Here's how it looks after I've configured mine:

 

Sentiment.PNG

 

The Analysis

 

I'm using Sentiment140 data for the analysis. Basically it's twitter data that's been pre-scored according to emoticons - if a tweet contained a smiley, it's positive; a frowny, negative. (Careful if you want to use this data as it's Twitter, so probably NSFW.) In order to do this for free (see Microsoft API record limit above), I'm limiting to a random subset of 10,000 tweets.

 

 accuracy.png

The Microsoft algorithm came out on top for accuracy. Interestingly, it was also the fastest, even though it was leveraging an API over the web.

speed.png

I've attached the Sentiment tool - feel free to tweak it to see if you can improve the accuracy or performance. I also encourage you to try to replicate my results, either with the Sentiment140 data or some other data source. I'll attach my analysis in the comments upon request.

Comments
Alteryx Certified Partner

Hi NeilR,

Hope you are well. Could you please share the configuration of the run command tool? It is my first time trying to run java through Alteryx and I need a bit of help setting it up.

Cheers,

Alexandra

Sr. Community Content Manager
Sr. Community Content Manager

@alexandra_hanna You can download the macro attached to the post and open it in Alteryx to see how the run command tool is configured. I've also pasted a screenshot of the configuration below...

Capture.PNG

Alteryx Partner

Hi, 

Thank you very much for the share...

However, I am facing some problems here...

 

When I use sentiment, I got this error

Error: Sentiment (2): Tool #5: Error in install.packages(package_name) : unable to install packages

 

On the other hand, when I tried the Stanford NLP another error occured:

Error: Sentiment (2): Tool #37: File not found "C:\Users\User\Documents\Tugas\EYSI\stanford-corenlp-full-2016-10-31\__temp_file_*.csv.out"

 

Any idea to help me? Smiley Very Happy

 

Thanks~
Regards - Mizu

Sr. Community Content Manager
Sr. Community Content Manager

@MizunashiSinayu can you post a screenshot of how you've configured the tool? it would also help if in your workflow you go to the Runtime tab of the Configuration panel and enable "Show All Macro Messages" and relay the entire log after re-running the workflow.

Alteryx Partner

 

Hi,

Thank you very much for the kind reply, for the sentiment I found a way to solve it.


The problem for 

Error: Sentiment (2): Tool #5: Error in install.packages(package_name) : unable to install packages

Is because I did not grant the access to alteryx to install the package. They will install the package somewhere in the Program Files/Alteryx/R 3.2.x/library     

I managed to fix this problem by right clicking the folder through explorer > properties > security. I grant all the permision and eureka, it works! I granted the access to the Alteryx root folder and it saves me from many problems (But at my own risk)

 

 

 

For the Stanford, I can only run it with the "2015-12-09" package (same as your screenshot). They launched a new "2016-10-31" package. The new package does not work.

 

 

The log is as follows:

  • Designer x64 Started running at 18/01/2017 14:35:54
  • Sentiment (2) Tool #11: 1 record was output
  • Sentiment (2) Tool #64: 0 records were output
  • Text Input (1) 1000 records were output
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_1.csv"
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_2.csv"
  • Sentiment (2) 1 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_3.csv"
  • Sentiment (2) 3 records were written to "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_all_files.txt"
  • Sentiment (2) Tool #37: 6 records were written in total
  • Sentiment (2) Tool #37: File not found "C:\Users\User\Documents\Tugas\ABCI\stanford-corenlp-full-2016-10-31\__temp_file_*.csv.out"
  • Designer x64 Finished running in 3,4 seconds with 1 error

 

Best regards,

Mizu

Atom

Hello - I'm trying to use the Standford functionality and I'm receiving the following error even using the 2015-12-09 package...

 

Error: Sentiment (1): Tool #37: The external program "java" returned an error code: 1

 

Any assistance is greatly appreciated. Smiley Happy

Sr. Community Content Manager
Sr. Community Content Manager

Hi @blevy - sorry for the late response. Can you post a screenshot of how you've configured the tool? It would also help if in your workflow you go to the Runtime tab of the Configuration panel and enable "Show All Macro Messages" and relay the entire log after re-running the workflow.

Sr. Community Content Manager
Sr. Community Content Manager

I just tried using this with the most recent version of the Stanford CoreNLP package (3.9.0 AKA stanford-corenlp-full-2018-01-31) and the macro requires a minor tweak to function properly. Open Run Command tool #37 and change the Read Results Input configuration from path\__temp_file_*.csv.out to path\__temp_file_*.csv.xml.

Thank you Neil for fixing the error. It works great now.