Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Data Science

Machine learning & data science for beginners and experts alike.
patrick_digan
17 - Castor
17 - Castor

Do you subscribe to EVERY post like I do?

pd_1.jpg

 

Good. Did you see @AndyC's post about using GoogleVis in R?

pd_2.jpg

 

It's ok I'll forgive you. You can't read them all. His post was about using the power of GoogleVis in Alteryx to produce an org chart. I have no need for an org chart, but the idea of using the power of Google Charts to display my data in Alteryx seemed interesting. I've heard of this "Tableau" guy, but I'm not the one writing the checks around this place for new software.

 

After a few several too many months, I finally found some time to try my hand at GoogleVis. Basically Google Charts are pretty cool, and they kindly produced an R package for dummies like me to use. Alteryx is even cooler, and kindly produced an R tool for dummies like me to use. Now that I had these tools at my disposal, I needed to find some really good data to test it on. I wanted it to be something of great value. It should be earth shattering. Monumental. Colossus. Then it came to me:

Spoiler
Red Sox coach's tenure.

Sweet!

 

I know. Pretty awesome. As a lifelong Red Sox fan, it seemed to me like an interesting way to display the number of years for recent Red Sox coaches.

 

First I needed a list of coaches. The Red Sox made that available here . 

 

pd_4.png

 

A mere mortal may just copy and paste that it into a text input tool. But what happens when John Farrell wins his 7th world Series in a row and decides to turn the team over the next coach? You would have an out of date alteryx workflow. Instead, I decided to use a download tool and always grab the coaches list fresh at runtime. The download tool returns all the raw html. I've never met some data that I can't use RegEx for. For those not familiar with HTML, the data I want is hiding between <tr> and <td> or <th> tags. I split each <tr> (table row) into rows using the tokenize method. Then I split each <td> (table data) and <th> (table header) into 3 columns using the tokenize method. My column names are in the first row, so I use the dynamic rename to move it to the column name. I use a couple text to columns tools to split the years served into the proper format. The generic format is yyyy-yyyy, but sometimes there can be multiple sets for a coach that had more than one stint. The first text to column uses a column delimiter to split that data into separate rows. Then, I need to split the two years into separate columns using - as the delimiter. The GoogleVis timeline function I want to use needs dates, so I dropped in a formula tool to quickly change the years to dates.

 

While it would be fine to include all the data, it looked better with a smaller dataset. I used a filter tool to only use the data since 1970. I sorted the data by start date, and passed it off to the R tool where the magic happens.

 

We'll take the R code piece by piece. You will need to have the predictive analytics tools installed: http://downloads.alteryx.com/predictive.html

 

package_name <- "googleVis"
##The GoogleVis package leverages the use of Google Charts API
##https://cran.r-project.org/web/packages/googleVis/index.html
if(package_name %in% rownames(installed.packages())==FALSE){
install.packages(package_name,repos="https://cloud.r-project.org/")
print("Installed googleVis")}

First we need to make sure that GoogleVis has been downloaded and is available.

 

##Read in data
datain<- read.Alteryx("#1", mode="data.frame")

This is the syntax for reading data into the R tool. It's Alteryx way of taking the data from Connection 1 and then working with that data in R.

 

##Load up GoogleVis
library("googleVis")

Next we need to actually load the googleVis package for R to use. Even after it's been installed once, we need to load it up every time when we want to run our code.

 

##We need to convert the dates to R date format
dataformat<- data.frame(datain$Name,datain$Record,as.Date(datain$Start),as.Date(datain$End))

hile we've been careful to make sure our Start and End fields were dates, we passed the whole data set in as a dataframe; thus we need to convert the Start and End back to dates using the as.Date functionality in R.

 

##Put column names back in
colnames(dataformat) <-c("Name","Record","Start","End")

The previous step loses the column names so have to put those back in. At this point you may be correctly wondering if I really know how to code in R. I've never taken a course or read a book on the subject; instead, I've read through handy examples that other users have posted on the Alteryx Community and learned through a lot of trial and error. My R code may not be perfect, but it gets the job done!

 

##Here you can build up any of the Google Charts options. See https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html 
Size<-list(width=1500,height=1000)
Timeline=list(timeline="{groupByRowLabel:true}")
#Colors<-list(colors="['#A9D18E','#C5E0B4','#A9D18E','#FFFFFF','#000000']")
#AreaOpacity<-list(areaOpacity="1.0")
#Type<-list(seriesType="area")
#Mods<-list(series="[{},{},{},{},{type:'line'}]")

While the google charts look good out of the box, there are a host of options available. I'm setting the size and grouping my row labels in case we had a couch listed twice. See Google Charts guide, examples, and GoogleVis Cran Repo.

 

##Combine all of the options into one variable
VisOptions<-c(Size,Timeline)

Now that we have our individual options set up, we need to combine them all to pass to GoogleVis.

 

##Make the call to GoogleVis
Timeline <- gvisTimeline(data=dataformat, 
                         rowlabel="Name",
                         barlabel="Record",
                         start="Start", 
                         end="End",
                         options=VisOptions)

For our timeline chart we use the gvisTimeline function and pass the necessary values and parameters. It returns the necessary html to display the chart.

 

##Write out the HTML for alteryx to render
write.Alteryx(Timeline, 1)

Now we just pass the html out so Alteryx can further process it.

               

Phew, hopefully you made it through the R code. We need the data from both the html.header and html.chart. There are 7 rows of html.chart data, and the html.header is just repeated 7 times. We group the html.header (so 1 record comes through) and concatenate the html.chart in the Summarize tool. I rename them A and B respectively so that we can properly order them later. I use the transpose tool to stack A and B on top of each other, followed by a sort tool to make sure A comes first. I concatenate the 2 rows in the summarize tool. Before my html is complete, I need to ensure the background is white. This helps me later when I need to overlay the html and chart together. My html is complete!

 

One of the nice things about googleVis is that the charts are interactive. You can hover over things, click on things, etc; however, this poses a challenge: I need the interactive html to be displayed in the gallery, but I also need the chart to be saved from the gallery to a file like excel or pdf. The easy part is the interactive html displaying in the gallery.

 

I used @Ned’s DynamicReportText Batch Macro tool  which passes the html to a report text tool with this text: <htmlpassthrough><![CDATA[ ReplaceMe ]]></htmlpassthrough>. We use an action tool to swap out our HTML for the ReplaceMe. The harder part is to produce something that can be saved from the gallery. For this, I again relied on Ned's great work and slightly modified a macro he put together . I won't/can't explain the macro in full detail, but the concept is this: We write the html out to file in output 23 and then use the phantom.exe (freeware) in the run command 14 to render our html and save it to a png. Side Note: the phantomJS that Ned was using is from several years ago. When I couldn’t solve an issue I was having, downloading the latest PhantomJS did the trick!

 

We now have a static copy of our chart to pair with our interactive version. We join the output from our 2 macros and use an overlay tool to put the interactive version on top of the static image. We throw a layout tool in to get the border just right. Last but not least we use a render tool to display our results! The nice thing is how modular everythnig after the R tool is. Once you get your data and setup the R tool to use GoogleVis for a different chart type, the rest of the module works unchanged.Yay modularity!

 

I was really happy with the final output. I have their win-loss record displayed as well since it was available:

 

pd_5.png

 

I've attached a working copy of the entire process. It should work on your desktop or Private Gallery.