Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
TaraM
Alteryx Alumni (Retired)

 

The following post was written by Joseph Lombardi, an intern working with the Alteryx Advanced Analytics team.

 

Overview


As with any real-world data source, nothing is perfect. Things can be dynamic, unsorted, or even encrypted, so it is useful to have the tools and skills to work around these issues. Alteryx comes in handy, as it gives its users the customization and data modification tools to make this problem-solving go a lot quicker.

 

There are three main topics covered:

 

  •  Automating the download of multiple files
  • Unzipping compressed data
  • Formatting the data to be read correctly into Alteryx

 

The real world example used to address these can be found here: http://thedataweb.rm.census.gov/ftp/cps_ftp.html


The following three workflows show a working example of this, available as a module here (or at the bottom of this post) to import into Alteryx

Workflow 1

Workflow 2

Workflow 3

 

Preparation

The first step is to ensure that Alteryx has the R Tool available by going to "Help" > "Install predictive tools". This can also be done by navigating to http://downloads.alteryx.com/Latest_RInstaller.htm . It may help to become familiarized with the 'Download' tool as well.

 

This whole project can be downloaded and ran with sample entries: CPS Download Decompress Open.yxzp

 

The example that will be used is the Current Population Survey (CPS) data found on http://thedataweb.rm.census.gov/ftp/cps_ftp.html
After downloading 1 month's data zip file, extracting, and opening the data with a text editor, the data appears to be fixed-width. The site also includes human-readable dictionaries for these files. The default method for Alteryx to open fixed-width data files is with a formatted .flat file dictionary, so a transformation is necessary.

 

In summary, what needs to be accomplished in the Alteryx workflow:

 

  • Download HTML and parse the links from the site
  • Download compressed data and dictionaries
  • Extract data
  • Format .flat dictionary from readable dictionary
  • Associate dictionaries with decompressed data files

 

The R Tool will play a big roll in allowing users with some coding experience to solve many issues that accompany working with real-world and minimally structured data sources.

 

Download

The Download Tool downloads data as either an input field in Alteryx, or saves to a file. Every link in HTML is of the form

>  <a href="http://example.com/link.html"> ...

 

This helps parsing the HTML of http://thedataweb.rm.census.gov/ftp/cps_ftp.html, which can be parsed using a Text To Columns Tool to generate rows with the delimiter: **"** (essentially leaving row entries with either link addresses or unused HTML.) The result would be

 

|...  |
|----|
|<a href=|
|http ://example.com/link1.zip|
|> </div> <a href=|
|http ://example.com/link2.zip|
|...|

 

Simply using a Filter Tool to find links with desired extensions -- .zip for compressed data and .txt for dictionaries -- it becomes possible to feed these links to another Download Tool to get the desired data.

 

For further steps, it helps to use the structure of the directory naming conventions to match dictionaries with corresponding data files using the Find Replace Tool to append which dictionary goes with each .zip.


> For example,
> *.../basic/201301-/January_2013_Record_Layout.txt*
is the dictionary for all files under
>*.../basic/201301-/*

 

Making the Data Usable


Extraction
This step requires the R Tool, or some use of an external program through the Run Command Tool.
The method using the R Tool is an easy application of the unzip() command in R:

```R
filepath = read.Alteryx("#1", mode="data.frame")
##  takes in the file path after it has been made into rows by a \ delimiter
 
directory = paste(filepath[1:nrow(filepath)-1,1],collapse = "/")
##  removes the file name, and reconcatenates it into a directory path
 
directory = gsub("·"," ",pathy) ##  makes sure spaces are handled correctly
zipfile = read.Alteryx("#2", mode="data.frame")[[1,1]]
## filepath to the compressed data
 
zipfile = gsub("·"," ",zipfile) ##  makes sure spaces are handled correctly
unzip(zipfile,exdir = directory) ## uncompresses the data.
## a directory is required since the working directory is a temporary folder,
## and not the folder the module is contained in.
```


Formatting a .Flat file
Putting together an R script to do this requires knowing what makes up a .flat file. Exporting a quick sample table from Alteryx (below) gives the following .flat file:

 

|a   |b   |Breed|
|----|----|----|
|1   |2   |cat |
|3   |4   |dog |
example.flat:
```HTML
<flatfile version="1">
   <file
      path="C:\Users\Administrator\Desktop\example.asc"    ## This filepath changes
      eoltype="crlf"
      />
   <fields>
      <field name="a  " type="Byte" length="3"/>        ## These
      <field name="b" type="Byte" length="3"/>          ## Column Descriptors
      <field name="Breed" type="String" length="3"/>    ## Change
   </fields>
</flatfile>
```

So now all that is left to get the data into Alteryx is to write an R script for the R Tool that writes a file in the previous format, substituting information such as the filepath and dictionary entries. Three things must be handled: the field name, the field type, and the length of the field. A working script to do this is contained in the module file at the top of the page.

 

 

Finishing up

Once the data is readable by Alteryx, it is reopened as a .flat with the filepath handy,and saved as an Alteryx database (.yxdb) for further use.

 

*written by* Joseph Lombardi

Tara McCoy

Tara McCoy is the Creative Director at Alteryx. Since joining Alteryx in 2004, Tara has held roles in Product Management, Content Engineering, and Community where her focus has always been on delivering an amazing product experience with Alteryx. From designing product icons and community badges, procuring excellent swag, concocting engaging contests, and crowdsourcing content from internal and external Alteryx advocates alike, Tara is dedicated to the Alteryx brand and wants everyone to experience the thrill of problem solving with Alteryx.

Tara McCoy is the Creative Director at Alteryx. Since joining Alteryx in 2004, Tara has held roles in Product Management, Content Engineering, and Community where her focus has always been on delivering an amazing product experience with Alteryx. From designing product icons and community badges, procuring excellent swag, concocting engaging contests, and crowdsourcing content from internal and external Alteryx advocates alike, Tara is dedicated to the Alteryx brand and wants everyone to experience the thrill of problem solving with Alteryx.